Demo: Findings vs No Findings Model

Phase 01 models—built on Tensorflow- and Keras-backed stacked BiLSTM architectures—follow a similar training scheme that includes:

  1. Loading the data

  2. Defining model constants

  3. Tokenizing the data using a GloVe embedding

  4. Defining the model

  5. Training the model

  6. Evaluating the results

This demo aims to provide a notebook with extended annotations for a more in-depth guide for understanding the code using dummy data. However, complete code can be found in /path/to/repo/src/nmrezman/phase01/train/general.py. This notebook can be found in /path/to/repo/examples/phase01.

Load the Data

Here the data is loaded. During Phase 01 development, preprocessing of the data was done beforehand via the code block below. We preprocessed the notes once, saved off the data, and then trained based off that dataframe. In this example, we employ a similar workflow to best match the source code provided.

Notably, the preprocessing includes (i) lowercasing the report text, (ii) extracting the “impression” / “findings” portion of the report based on the keywords in the report, (iii) removing doctor signatures, and (iv) removing any new lines. This general utility is found via nmrezman.utils.preprocess_input. Note, you will likely need to modify this function to best match the formatting of the reports in your hospital network and / or account for extra blank text, new line, etc. introduced by your system / cloud platform(s).

import os
import pandas as pd
import joblib
from nmrezman import utils

base_path = os.path.dirname("__file__")
data_path = os.path.abspath(os.path.join(base_path, "..", "demo_data.csv"))
df = pd.read_csv(data_path)
df["new_note"] = df["note"].apply(lambda x: utils.preprocess_input(x, is_phase_2=False))
joblib.dump(df, os.path.abspath(os.path.join(base_path, "..", "demo_data.gz")))
df.to_csv(os.path.abspath(os.path.join(base_path, "..", "demo_data.csv")), index=False)
[1]:
import os
import joblib
from IPython.display import display, HTML

# Define the path to the data
base_path = os.path.dirname("__file__")
data_path = os.path.abspath(os.path.join(base_path, "..", "demo_data.gz"))

# Import data
# NOTE: this data has already been preprocessed, extracting the findings, removing Dr signature, etc.
# See `from ..utils import preprocess_input`
modeling_df = joblib.load(data_path)

# Get preprocessed notes and labels (X and y)
X = modeling_df["new_note"]
labels = [0 if i == "No Findings" else 1 for i in modeling_df["selected_finding"]]

display(HTML(modeling_df.head(3).to_html()))
rpt_num note selected_finding selected_proc selected_label new_note
0 1 PROCEDURE: CT CHEST WO CONTRAST. HISTORY: Wheezing TECHNIQUE: Non-contrast helical thoracic CT was performed. COMPARISON: There is no prior chest CT for comparison. FINDINGS: Support Devices: None. Heart/Pericardium/Great Vessels: Cardiac size is normal. There is no calcific coronary artery atherosclerosis. There is no pericardial effusion. The aorta is normal in diameter. The main pulmonary artery is normal in diameter. Pleural Spaces: Few small pleural calcifications are present in the right pleura for example on 2/62 and 3/76. The pleural spaces are otherwise clear. Mediastinum/Hila: There is no mediastinal or hilar lymph node enlargement. Subcentimeter minimally calcified paratracheal lymph nodes are likely related to prior granulomas infection. Neck Base/Chest Wall/Diaphragm/Upper Abdomen: There is no supraclavicular or axillary lymph node enlargement. Limited, non-contrast imaging through the upper abdomen is within normal limits. Mild degenerative change is present in the spine. Lungs/Central Airways: There is a 15 mm nodular density in the nondependent aspect of the bronchus intermedius on 2/52. The trachea and central airways are otherwise clear. There is mild diffuse bronchial wall thickening. There is a calcified granuloma in the posterior right upper lobe. The lungs are otherwise clear. CONCLUSIONS: 1. There is mild diffuse bronchial wall thickening suggesting small airways disease such as asthma or bronchitis in the appropriate clinical setting. 2. A 3 mm nodular soft tissue attenuation in the nondependent aspect of the right bronchus intermedius is nonspecific, which could be mucus or abnormal soft tissue. A follow-up CT in 6 months might be considered to evaluate the growth. 3. Stigmata of old granulomatous disease is present.   FINAL REPORT Attending Radiologist: Lung Findings CT Chest A 3 mm nodular soft tissue attenuation in the nondependent aspect of the right bronchus intermedius is nonspecific, which could be mucus or abnormal soft tissue. A follow-up CT in 6 months might be considered to evaluate the growth. support devices: none. heart/pericardium/great vessels: cardiac size is normal. there is no calcific coronary artery atherosclerosis. there is no pericardial effusion. the aorta is normal in diameter. the main pulmonary artery is normal in diameter. pleural spaces: few small pleural calcifications are present in the right pleura for example on 2/62 and 3/76. the pleural spaces are otherwise clear. mediastinum/hila: there is no mediastinal or hilar lymph node enlargement. subcentimeter minimally calcified paratracheal lymph nodes are likely related to prior granulomas infection. neck base/chest wall/diaphragm/upper abdomen: there is no supraclavicular or axillary lymph node enlargement. limited, non-contrast imaging through the upper abdomen is within normal limits. mild degenerative change is present in the spine. lungs/central airways: there is a 15 mm nodular density in the nondependent aspect of the bronchus intermedius on 2/52. the trachea and central airways are otherwise clear. there is mild diffuse bronchial wall thickening. there is a calcified granuloma in the posterior right upper lobe. the lungs are otherwise clear. conclusions: 1. there is mild diffuse bronchial wall thickening suggesting small airways disease such as asthma or bronchitis in the appropriate clinical setting. 2. a 3 mm nodular soft tissue attenuation in the nondependent aspect of the right bronchus intermedius is nonspecific, which could be mucus or abnormal soft tissue. a follow-up ct in 6 months might be considered to evaluate the growth. 3. stigmata of old granulomatous disease is present.
1 2 PROCEDURE: CT ABDOMEN PELVIS W CONTRAST COMPARISON: date INDICATIONS: Lower abdominal/flank pain on the right TECHNIQUE: After obtaining the patients consent, CT images were created with intravenous iodinated contrast. FINDINGS: LIVER: The liver is normal in size. No suspicious liver lesion is seen. The portal and hepatic veins are patent. BILIARY: No biliary duct dilation. The biliary system is otherwise unremarkable. PANCREAS: No focal pancreatic lesion. No pancreatic duct dilation. SPLEEN: No suspicious splenic lesion is seen. The spleen is normal in size. KIDNEYS: No suspicious renal lesion is seen. No hydronephrosis. ADRENALS: No adrenal gland nodule or thickening. AORTA/VASCULAR: No aneurysm. RETROPERITONEUM: No lymphadenopathy. BOWEL/MESENTERY: The appendix is normal. No bowel wall thickening or bowel dilation. ABDOMINAL WALL: No hernia. URINARY BLADDER: Incomplete bladder distension limits evaluation, but no focal wall thickening or calculus is seen. PELVIC NODES: No lymphadenopathy. PELVIC ORGANS: Status post hysterectomy. No pelvic mass. BONES: No acute fracture or suspicious osseous lesion. LUNG BASES: No pleural effusion or consolidation. OTHER: Small hiatal hernia. CONCLUSION: 1. No acute process is detected. 2. Small hiatal hernia   FINAL REPORT Attending Radiologist: No Findings NaN No label liver: the liver is normal in size. no suspicious liver lesion is seen. the portal and hepatic veins are patent. biliary: no biliary duct dilation. the biliary system is otherwise unremarkable. pancreas: no focal pancreatic lesion. no pancreatic duct dilation. spleen: no suspicious splenic lesion is seen. the spleen is normal in size. kidneys: no suspicious renal lesion is seen. no hydronephrosis. adrenals: no adrenal gland nodule or thickening. aorta/vascular: no aneurysm. retroperitoneum: no lymphadenopathy. bowel/mesentery: the appendix is normal. no bowel wall thickening or bowel dilation. abdominal wall: no hernia. urinary bladder: incomplete bladder distension limits evaluation, but no focal wall thickening or calculus is seen. pelvic nodes: no lymphadenopathy. pelvic organs: status post hysterectomy. no pelvic mass. bones: no acute fracture or suspicious osseous lesion. lung bases: no pleural effusion or consolidation. other: small hiatal hernia. conclusion: 1. no acute process is detected. 2. small hiatal hernia
2 3 EXAM: MRI ABDOMEN W WO CONTRAST CLINICAL INDICATION: Cirrhosis of liver without ascites, unspecified hepatic cirrhosis type (CMS-HCC) TECHNIQUE: MRI of the abdomen was performed with and without contrast. Multiplanar imaging was performed. 8.5 cc of Gadavist was administered. COMPARISON: DATE and priors FINDINGS: On limited views of the lung bases, no acute abnormality is noted. There may be mild distal esophageal wall thickening. On the out of phase series, there is suggestion of some signal gain within the hepatic parenchyma. This is stable. A tiny cystic nonenhancing focus is seen anteriorly in the right hepatic lobe (9/10), unchanged. A subtly micronodular hepatic periphery is noted. There are few subtle hypervascular lesions in the right hepatic lobe, without significant washout. The portal vein is patent. Some splenorenal shunting is redemonstrated, similar to the comparison exam. The spleen measures 12.4 cm in length. No focal splenic lesion is appreciated. There are several small renal lesions again seen, many of which again demonstrate T1 shortening. On the postcontrast subtraction series, no obvious enhancement is noted. The adrenal glands and pancreas are intact. There is mild cholelithiasis, without gallbladder wall thickening or pericholecystic fluid. No free abdominal fluid is visualized. IMPRESSION: 1. Stable cirrhotic appearance of the liver. Few subtly hypervascular hepatic lesions do not demonstrate washout, and probably relate to perfusion variants. No particularly suspicious hepatic mass is seen. 2. Mild splenomegaly to 12.4 cm redemonstrated. Splenorenal shunting is again seen. 3. Scattered simple and complex renal cystic lesions, nonenhancing, stable from March 2040. 4. Incidentally, there is evidence of signal gain in the liver on the out of phase series. This occasionally may represent iron overload.   FINAL REPORT Attending Radiologist: No Findings NaN No label on limited views of the lung bases, no acute abnormality is noted. there may be mild distal esophageal wall thickening. on the out of phase series, there is suggestion of some signal gain within the hepatic parenchyma. this is stable. a tiny cystic nonenhancing focus is seen anteriorly in the right hepatic lobe (9/10), unchanged. a subtly micronodular hepatic periphery is noted. there are few subtle hypervascular lesions in the right hepatic lobe, without significant washout. the portal vein is patent. some splenorenal shunting is redemonstrated, similar to the comparison exam. the spleen measures 12.4 cm in length. no focal splenic lesion is appreciated. there are several small renal lesions again seen, many of which again demonstrate t1 shortening. on the postcontrast subtraction series, no obvious enhancement is noted. the adrenal glands and pancreas are intact. there is mild cholelithiasis, without gallbladder wall thickening or pericholecystic fluid. no free abdominal fluid is visualized. impression: 1. stable cirrhotic appearance of the liver. few subtly hypervascular hepatic lesions do not demonstrate washout, and probably relate to perfusion variants. no particularly suspicious hepatic mass is seen. 2. mild splenomegaly to 12.4 cm redemonstrated. splenorenal shunting is again seen. 3. scattered simple and complex renal cystic lesions, nonenhancing, stable from march 2040. 4. incidentally, there is evidence of signal gain in the liver on the out of phase series. this occasionally may represent iron overload.

Define Model Constants

Next, we define some constants that will help parameterize our model. The numbers can be tuned to your specific application. The max_sequence_length represents the max length of the reports. In general, we found that the impression section of the NM radiology reports were about ~250 in length, so this was set to 300. The max_num_words represent the max number of words in the vocab to start with. Ultimately, the model will use the actual vocab size for training. Lastly, glove_embedding_dim is the dimension (hyperparameter) of the word embedding as defined by the GloVe word vector. Unless you use one of their other embeddings, this number stays the same; regardless, it should match the downloaded GloVe word vector.

[2]:
# Define model constants
max_sequence_length = 300       # Max length of report. Avg NM is ~250
max_num_words = 15000           # Max number of words for vocab
glove_embedding_dim = 300       # GloVe embedding dimension size

Tokenize the Data using a GloVe Embedding

Using a Keras tokenizer object, we define the tokenizer based on the whole text where each word is assigned a unique number and every word is associated with a number. We add basic filtering to remove special characters from getting assigned a value and lowercase all the text to prevent capitalization variations generating new tokens (e.g., “lung” vs “Lung” being assigned different tokens).

Padding is used so that all reports are the same length. In this case, we prepad since we generally found the–for NM reports–the radiology findings and follow-up recommendations were found in the last section of the report. So, if a report is greater than our defined max_sequence_length (300), it will truncated the text; however, if the report is shorter, the tokenizer will add 0 values (i.e., a placeholder token) at the beginning of the text.

Lastly, we calculate the vocab_size–the number of words in the token vector–to give to the model.

[3]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# Define the tokenizer
# Lowercase the text; filter out special characters
tokenizer = Tokenizer(num_words=max_num_words, filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~', lower=True)
tokenizer.fit_on_texts(X)
word_index = tokenizer.word_index
vocab_size = len(word_index)+1

# Tokenize the notes
# Prepend since radiology fidings are almost always located in the last section of the report
X_tokenized = tokenizer.texts_to_sequences(X)
X_tokenized = pad_sequences(X_tokenized, maxlen=max_sequence_length, padding="pre")

print('Found %s unique tokens.' % len(word_index))
Found 545 unique tokens.

For setting up GloVe embedding matrix, first we download the 300 dimension GloVe embedding file glove.6B.300d.txt (see the GloVe project website). Do this by either manually downloading and extracting the embeddings from the .zip source into the workspace, or by running the following wget command in your cli:

wget "https://nlp.stanford.edu/data/glove.6B.zip" -O /tmp/temp.zip
unzip /tmp/temp.zip glove.6B.300d.txt -d /workspace/data
rm /tmp/temp.zip

Next, create an embedding vector that will have keys as words present in the GloVe embedding file with its value:

[4]:
import numpy as np

# Data path to the pre-downloaded Stanford pretrained word vectors
# TODO: update this path to your local location of GloVe Stanford pretrained word vectors `glove.6B.300d`
glove_embedding_path = "/path/to/data/glove.6B.300d.txt"

# Get GloVe embedding matrix
# NOTE: Stanford pretrained word vectors glove.6B.300d were downloaded from https://nlp.stanford.edu/projects/glove/
glove_embeddings_index = {}
f = open(glove_embedding_path, encoding="utf8")
for line in f:
    values = line.split()
    word = values[0]
    try:
        coefs = np.asarray(values[1:], dtype="float32")
    except:
        pass
    glove_embeddings_index[word] = coefs
f.close()

glove_embedding_matrix = np.random.random((len(word_index) + 1, glove_embedding_dim))
for word, i in word_index.items():
    glove_embedding_vector = glove_embeddings_index.get(word)
    if glove_embedding_vector is not None:
            # words not found in embedding index will be all-zeros.
            if len(glove_embedding_matrix[i]) != len(glove_embedding_vector):
                print("could not broadcast input array from shape", str(len(glove_embedding_matrix[i])),
                    "into shape", str(len(glove_embedding_vector)), " Please make sure your"
                                                                " EMBEDDING_DIM is equal to embedding_vector file ,GloVe,")
                exit(1)
            glove_embedding_matrix[i] = glove_embedding_vector

Define the Model

Next we define the model, which, in this case, is a stacked biLSTM model. We use the Tensorflow and Keras libraries to define this custom model. These layers can be modified as needed.

[5]:
from keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Embedding, SpatialDropout1D, Bidirectional, LSTM
from tensorflow.keras.optimizers import Adam

model = Sequential()
model.add(Embedding(vocab_size,
                    glove_embedding_dim,
                    weights=[glove_embedding_matrix],
                    input_length=max_sequence_length,
                    trainable=False),
)
model.add(SpatialDropout1D(0.25))
model.add(Bidirectional(LSTM(200, return_sequences=True)))
model.add(Bidirectional(LSTM(200, return_sequences=True)))
model.add(Bidirectional(LSTM(200)))
model.add(Dropout(0.1))
model.add(Dense(12))
model.add(Dense(units=2, activation="softmax"))
adam = Adam(learning_rate=0.0011)
model.compile(loss="categorical_crossentropy", optimizer=adam, metrics=["accuracy"])

model.summary(expand_nested=True)
2022-03-07 23:48:42.157710: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-07 23:48:42.915489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14635 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0001:00:00.0, compute capability: 7.0
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 embedding (Embedding)       (None, 300, 300)          163800

 spatial_dropout1d (SpatialD  (None, 300, 300)         0
 ropout1D)

 bidirectional (Bidirectiona  (None, 300, 400)         801600
 l)

 bidirectional_1 (Bidirectio  (None, 300, 400)         961600
 nal)

 bidirectional_2 (Bidirectio  (None, 400)              961600
 nal)

 dropout (Dropout)           (None, 400)               0

 dense (Dense)               (None, 12)                4812

 dense_1 (Dense)             (None, 2)                 26

=================================================================
Total params: 2,893,438
Trainable params: 2,729,638
Non-trainable params: 163,800
_________________________________________________________________

Model Training Process

First, we split the data into an 80/20 train and test sets. Note that a different random state is used here (vs the source code) to suit this small sample dataset.

[6]:
from sklearn.model_selection import train_test_split

# Split the data into train and test
train_x, valid_x, train_y, valid_y = train_test_split(X_tokenized, labels, test_size=0.20, random_state=25)

Next we define directories and output file names. These locations will be were our final, best trained model live once training is complete. Once trained, these model weights can be used to classify new reports.

[7]:
model_checkpoint_name = "/path/to/results/phase01/demo/findings/findings_best_model.h5"
result_fname = "/path/to/results/phase01/demo/findings/findings_best_result.log"

# Make dirs to save results
os.makedirs(os.path.dirname(model_checkpoint_name), exist_ok=True)
os.makedirs(os.path.dirname(result_fname), exist_ok=True)

Now it’s time to train! Keras does a lot of the heavy lifting here. We add some callbacks to stop early based on if the validation loss continues to decrease. Additionally, we only save the best checkpoint since we only care about saving the model with the best performance. The model will be trained for upwards of 100 epochs.

[8]:
import keras.backend as K
from tensorflow.keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint, EarlyStopping

# Clear the Keras backend
K.clear_session()

# Train!
es = EarlyStopping(monitor="val_loss", mode="min", verbose=1, patience=15,)
mc = ModelCheckpoint(model_checkpoint_name, monitor="val_loss", mode="min", verbose=1, save_best_only=True,)
model.fit(
    train_x,
    to_categorical(train_y),
    epochs=30,
    batch_size=100,
    callbacks=[es, mc],
    verbose=1,
    validation_data=(valid_x, to_categorical(valid_y)),
)

Epoch 1/30
2022-03-07 23:48:52.718937: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8204
1/1 [==============================] - ETA: 0s - loss: 0.6717 - accuracy: 0.6250
Epoch 00001: val_loss improved from inf to 0.78142, saving model to /path/to/results/phase01/demo/findings/findings_best_model.h5
1/1 [==============================] - 11s 11s/step - loss: 0.6717 - accuracy: 0.6250 - val_loss: 0.7814 - val_accuracy: 0.3333
Epoch 2/30
1/1 [==============================] - ETA: 0s - loss: 0.5393 - accuracy: 0.7500
Epoch 00002: val_loss did not improve from 0.78142
1/1 [==============================] - 0s 127ms/step - loss: 0.5393 - accuracy: 0.7500 - val_loss: 0.9052 - val_accuracy: 0.6667
Epoch 3/30
1/1 [==============================] - ETA: 0s - loss: 0.7345 - accuracy: 0.6250
Epoch 00003: val_loss did not improve from 0.78142
1/1 [==============================] - 0s 123ms/step - loss: 0.7345 - accuracy: 0.6250 - val_loss: 0.9365 - val_accuracy: 0.3333
Epoch 4/30
1/1 [==============================] - ETA: 0s - loss: 0.3943 - accuracy: 0.7500
Epoch 00004: val_loss did not improve from 0.78142
1/1 [==============================] - 0s 116ms/step - loss: 0.3943 - accuracy: 0.7500 - val_loss: 0.9092 - val_accuracy: 0.3333
Epoch 5/30
1/1 [==============================] - ETA: 0s - loss: 0.4060 - accuracy: 0.7500
Epoch 00005: val_loss improved from 0.78142 to 0.53731, saving model to /path/to/results/phase01/demo/findings/findings_best_model.h5
1/1 [==============================] - 0s 245ms/step - loss: 0.4060 - accuracy: 0.7500 - val_loss: 0.5373 - val_accuracy: 0.6667
Epoch 6/30
1/1 [==============================] - ETA: 0s - loss: 0.2056 - accuracy: 1.0000
Epoch 00006: val_loss did not improve from 0.53731
1/1 [==============================] - 0s 116ms/step - loss: 0.2056 - accuracy: 1.0000 - val_loss: 0.6331 - val_accuracy: 0.6667
Epoch 7/30
1/1 [==============================] - ETA: 0s - loss: 0.2440 - accuracy: 0.8750
Epoch 00007: val_loss improved from 0.53731 to 0.42821, saving model to /path/to/results/phase01/demo/findings/findings_best_model.h5
1/1 [==============================] - 0s 219ms/step - loss: 0.2440 - accuracy: 0.8750 - val_loss: 0.4282 - val_accuracy: 0.6667
Epoch 8/30
1/1 [==============================] - ETA: 0s - loss: 0.0470 - accuracy: 1.0000
Epoch 00008: val_loss did not improve from 0.42821
1/1 [==============================] - 0s 113ms/step - loss: 0.0470 - accuracy: 1.0000 - val_loss: 0.5175 - val_accuracy: 0.6667
Epoch 9/30
1/1 [==============================] - ETA: 0s - loss: 0.0264 - accuracy: 1.0000
Epoch 00009: val_loss did not improve from 0.42821
1/1 [==============================] - 0s 117ms/step - loss: 0.0264 - accuracy: 1.0000 - val_loss: 1.1193 - val_accuracy: 0.6667
Epoch 10/30
1/1 [==============================] - ETA: 0s - loss: 0.0244 - accuracy: 1.0000
Epoch 00010: val_loss did not improve from 0.42821
1/1 [==============================] - 0s 116ms/step - loss: 0.0244 - accuracy: 1.0000 - val_loss: 1.3914 - val_accuracy: 0.3333
Epoch 11/30
1/1 [==============================] - ETA: 0s - loss: 0.0069 - accuracy: 1.0000
Epoch 00011: val_loss did not improve from 0.42821
1/1 [==============================] - 0s 111ms/step - loss: 0.0069 - accuracy: 1.0000 - val_loss: 1.5533 - val_accuracy: 0.3333
Epoch 12/30
1/1 [==============================] - ETA: 0s - loss: 0.0034 - accuracy: 1.0000
Epoch 00012: val_loss did not improve from 0.42821
1/1 [==============================] - 0s 114ms/step - loss: 0.0034 - accuracy: 1.0000 - val_loss: 1.5892 - val_accuracy: 0.6667
Epoch 13/30
1/1 [==============================] - ETA: 0s - loss: 0.0054 - accuracy: 1.0000
Epoch 00013: val_loss did not improve from 0.42821
1/1 [==============================] - 0s 114ms/step - loss: 0.0054 - accuracy: 1.0000 - val_loss: 1.2127 - val_accuracy: 0.6667
Epoch 14/30
1/1 [==============================] - ETA: 0s - loss: 0.0013 - accuracy: 1.0000
Epoch 00014: val_loss did not improve from 0.42821
1/1 [==============================] - 0s 118ms/step - loss: 0.0013 - accuracy: 1.0000 - val_loss: 0.7344 - val_accuracy: 0.6667
Epoch 15/30
1/1 [==============================] - ETA: 0s - loss: 0.0012 - accuracy: 1.0000
Epoch 00015: val_loss improved from 0.42821 to 0.29115, saving model to /path/to/results/phase01/demo/findings/findings_best_model.h5
1/1 [==============================] - 0s 229ms/step - loss: 0.0012 - accuracy: 1.0000 - val_loss: 0.2912 - val_accuracy: 0.6667
Epoch 16/30
1/1 [==============================] - ETA: 0s - loss: 7.6486e-04 - accuracy: 1.0000
Epoch 00016: val_loss improved from 0.29115 to 0.12271, saving model to /path/to/results/phase01/demo/findings/findings_best_model.h5
1/1 [==============================] - 0s 221ms/step - loss: 7.6486e-04 - accuracy: 1.0000 - val_loss: 0.1227 - val_accuracy: 1.0000
Epoch 17/30
1/1 [==============================] - ETA: 0s - loss: 5.9440e-04 - accuracy: 1.0000
Epoch 00017: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 113ms/step - loss: 5.9440e-04 - accuracy: 1.0000 - val_loss: 0.3862 - val_accuracy: 0.6667
Epoch 18/30
1/1 [==============================] - ETA: 0s - loss: 6.6654e-04 - accuracy: 1.0000
Epoch 00018: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 118ms/step - loss: 6.6654e-04 - accuracy: 1.0000 - val_loss: 0.9521 - val_accuracy: 0.6667
Epoch 19/30
1/1 [==============================] - ETA: 0s - loss: 3.7514e-04 - accuracy: 1.0000
Epoch 00019: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 117ms/step - loss: 3.7514e-04 - accuracy: 1.0000 - val_loss: 1.4473 - val_accuracy: 0.6667
Epoch 20/30
1/1 [==============================] - ETA: 0s - loss: 3.6986e-04 - accuracy: 1.0000
Epoch 00020: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 115ms/step - loss: 3.6986e-04 - accuracy: 1.0000 - val_loss: 1.8139 - val_accuracy: 0.6667
Epoch 21/30
1/1 [==============================] - ETA: 0s - loss: 3.2677e-04 - accuracy: 1.0000
Epoch 00021: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 117ms/step - loss: 3.2677e-04 - accuracy: 1.0000 - val_loss: 2.0806 - val_accuracy: 0.6667
Epoch 22/30
1/1 [==============================] - ETA: 0s - loss: 2.0694e-04 - accuracy: 1.0000
Epoch 00022: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 118ms/step - loss: 2.0694e-04 - accuracy: 1.0000 - val_loss: 2.2770 - val_accuracy: 0.6667
Epoch 23/30
1/1 [==============================] - ETA: 0s - loss: 1.7372e-04 - accuracy: 1.0000
Epoch 00023: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 122ms/step - loss: 1.7372e-04 - accuracy: 1.0000 - val_loss: 2.4248 - val_accuracy: 0.6667
Epoch 24/30
1/1 [==============================] - ETA: 0s - loss: 1.7007e-04 - accuracy: 1.0000
Epoch 00024: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 117ms/step - loss: 1.7007e-04 - accuracy: 1.0000 - val_loss: 2.5394 - val_accuracy: 0.6667
Epoch 25/30
1/1 [==============================] - ETA: 0s - loss: 1.7452e-04 - accuracy: 1.0000
Epoch 00025: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 112ms/step - loss: 1.7452e-04 - accuracy: 1.0000 - val_loss: 2.6312 - val_accuracy: 0.6667
Epoch 26/30
1/1 [==============================] - ETA: 0s - loss: 1.3593e-04 - accuracy: 1.0000
Epoch 00026: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 115ms/step - loss: 1.3593e-04 - accuracy: 1.0000 - val_loss: 2.7067 - val_accuracy: 0.6667
Epoch 27/30
1/1 [==============================] - ETA: 0s - loss: 1.3290e-04 - accuracy: 1.0000
Epoch 00027: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 115ms/step - loss: 1.3290e-04 - accuracy: 1.0000 - val_loss: 2.7706 - val_accuracy: 0.6667
Epoch 28/30
1/1 [==============================] - ETA: 0s - loss: 9.7877e-05 - accuracy: 1.0000
Epoch 00028: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 114ms/step - loss: 9.7877e-05 - accuracy: 1.0000 - val_loss: 2.8258 - val_accuracy: 0.6667
Epoch 29/30
1/1 [==============================] - ETA: 0s - loss: 9.9574e-05 - accuracy: 1.0000
Epoch 00029: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 117ms/step - loss: 9.9574e-05 - accuracy: 1.0000 - val_loss: 2.8743 - val_accuracy: 0.6667
Epoch 30/30
1/1 [==============================] - ETA: 0s - loss: 6.7379e-05 - accuracy: 1.0000
Epoch 00030: val_loss did not improve from 0.12271
1/1 [==============================] - 0s 113ms/step - loss: 6.7379e-05 - accuracy: 1.0000 - val_loss: 2.9174 - val_accuracy: 0.6667
[8]:
<keras.callbacks.History at 0x7f4c14f29370>

Evaluate the Results

Using sklearn’s classification_report and confusion_matrix, we can evaluate how well the model performs on the test dataset.

[9]:
from keras.models import load_model
from sklearn.metrics import classification_report, confusion_matrix

# Load in the best model
best_model = load_model(model_checkpoint_name)

# Perform confusion matrix and save the results
y_pred = best_model.predict(np.array(valid_x))
y_pred = np.argmax(y_pred, axis=1)
report = classification_report(valid_y, y_pred)
matrix = confusion_matrix(valid_y, y_pred)
print(report)
print(matrix)
with open(result_fname, "w") as fh:
    fh.write("Classification Report:\n")
    fh.write(report)
    fh.write("\n\nConfusion Matrix:\n")
    fh.write(np.array2string(matrix, separator=", "))
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         2
           1       1.00      1.00      1.00         1

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3

[[2 0]
 [0 1]]