Network pruning prior to training makes generalization more challenging than ever, while recent studies mainly focus on the trainability of the pruned networks in isolation. This paper explores a new perspective on loss implicit decrease of the data to be trained caused by one-batch training during each round, whose first-order approximation we term gradient coupled flow. We thus present a criterion sensitive to gradient coupled flow (GCS), which is hypothesized to capture those weights most sensitive to performance boosting at initialization. Interestingly, our explorations show there exists ...