M1 Max GPU performance drop
I was training neural networks when noticed 8 times performance drop. In activity monitor, GPU usage was only 20%.
I quickly ran Tensorflow flops test and it showed 0.5 Tflops versus usual 7-8 Tflops. Benchmark code:
import os
import sys
import tensorflow as tf
import time
n = 7192
dtype = tf.float32
tf.compat.v1.disable_eager_execution()
with tf.device("/GPU:0"):
matrix1 = tf.Variable(tf.ones((n, n), dtype=dtype))
matrix2 = tf.Variable(tf.ones((n, n), dtype=dtype))
product = tf.matmul(matrix1, matrix2)
# avoid optimizing away redundant nodes
config = tf.compat.v1.ConfigProto(graph_options=tf.compat.v1.GraphOptions(optimizer_options=tf.compat.v1.OptimizerOptions(opt_level=tf.compat.v1.OptimizerOptions.L0)))
sess = tf.compat.v1.Session(config=config)
sess.run(tf.compat.v1.global_variables_initializer())
iters = 15
# pre-warming
sess.run(product.op)
start = time.time()
for i in range(iters):
sess.run(product.op)
end = time.time()
ops = n**3 + (n-1)*n**2 # n^2*(n-1) additions, n^3 multiplications
elapsed = (end - start)
rate = iters*ops/elapsed/10**12
print('\n %d x %d matmul took: %.2f sec, %.2f T ops/sec' % (n, n,
elapsed/iters,
rate,))
Restarting MacBook helped, but I do not want to face the same issue again. Restarting Mac during NN training is not something pleasant.
Has anyone faced simmilar issue of performance drop? Have not found any comments on the internet about the issue yet.
PS: this is unrelated to Tensorflow or python, as I also ran Swift Metal performance test and it also showed that GPU's flops are extremely low. Lower than that of CPU's actually.
System: macOS Monterey 12.2
Spec: M1 Max 32 GPU Cores / 32 GB RAM