To address the challenges of strong multiscale spatiotemporal feature coupling and complex long-term dependencies in short-term power load data, this paper proposes a hybrid short-term load forecasting (STLF) method based on an improved Temporal Fusion Transformer (TFT) model. First, a parallel dilated convolutional network (DCNN) is constructed as a feature extraction module, leveraging convolutional kernels with different dilation rates to capture local periodic patterns and multiscale spatiotemporal correlations in load sequences. Second, a Bidirectional Gated Recurrent Unit (BiGRU) is used...