期刊:
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,2025年22(2):997-1010 ISSN:1545-5971
作者机构:
[Xinglin Zhang] School of Computer Science and Engineering, South China University of Technology, Guangzhou, China;[Anfeng Liu] School of Electronic Information, Central South University, Changsha, China;School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China;[Zhemin Yang; Min Yang] School of Computer Science, Fudan University, Shanghai, China;[Zhetao Li] National & Local Joint Engineering Research Center of Network Security Detection and Protection Technology, Guangdong Provincial Key Laboratory of Data Security and Privacy Protection, College of Information Science and Technology, Jinan University, Guangzhou, China
摘要:
As a collaborative and open network, billions of devices can be free to join the IoT-based data collection network for data perception and transmission. Along with this trend, more and more malicious attackers enter the network, they steal or tamper with data, and hinder data exchange and communication. To address these issues, we propose a Proactive Trust Evaluation System (PTES) for secure data collection by evaluating the trust of mobile data collectors. Specifically, PTES guarantees evaluation accuracy from trust evidence acquisition, trust evidence storage, and trust value calculation. First, PTES obtains trust evidence based on active detection of drones, feedbacks from interacted objects, and recommendations from trusted third parties. Then, these trust evidences are stored according to interaction time by adopting a sliding window mechanism. After that, credible, untrustworthy, and uncertain evidence sequences are extracted from the storage space, and assigned with positive, negative, and tendentious trust values, respectively. Consequently, the final normalized trust is obtained by combining the three trust values. Finally, extensive experiments conducted on a real-world dataset demonstrate PTES is superior to benchmark methods in terms of detection accuracy and profit.
摘要:
Despite the evident advantages of variants of UNet in medical image segmentation, these methods still exhibit limitations in the extraction of foreground, background, and boundary features. Based on feature guidance, we propose a new network (FG-UNet). Specifically, adjacent high-level and low-level features are used to gradually guide the network to perceive lesion features. To accommodate lesion features of different scales, the multi-order gated aggregation (MGA) block is designed based on multi-order feature interactions. Furthermore, a novel feature-guided context-aware (FGCA) block is devised to enhance the capability of FG-UNet to segment lesions by fusing boundary-enhancing features, object-enhancing features, and uncertain areas. Eventually, a bi-dimensional interaction attention (BIA) block is designed to enable the network to highlight crucial features effectively. To appraise the effectiveness of FG-UNet, experiments were conducted on Kvasir-seg, ISIC2018, and COVID-19 datasets. The experimental results illustrate that FG-UNet achieves a DSC score of 92.70% on the Kvasir-seg dataset, which is 1.15% higher than that of the latest SCUNet++, 4.70% higher than that of ACC-UNet, and 5.17% higher than that of UNet.
摘要:
The rapid development of the Internet has led to the widespread dissemination of manipulated facial images, significantly impacting people's daily lives. With the continuous advancement of Deepfake technology, the generated counterfeit facial images have become increasingly challenging to distinguish. There is an urgent need for a more robust and convincing detection method. Current detection methods mainly operate in the spatial domain and transform the spatial domain into other domains for analysis. With the emergence of transformers, some researchers have also combined traditional convolutional networks with transformers for detection. This paper explores the artifacts left by Deepfakes in various domains and, based on this exploration, proposes a detection method that utilizes the steganalysis rich model to extract high-frequency noise to complement spatial features. We have designed two main modules to fully leverage the interaction between these two aspects based on traditional convolutional neural networks. The first is the multi-scale mixed feature attention module, which introduces artifacts from high-frequency noise into spatial textures, thereby enhancing the model's learning of spatial texture features. The second is the multiscale channel attention module, which reduces the impact of background noise by weighting the features. Our proposed method was experimentally evaluated on mainstream datasets, and a significant amount of experimental results demonstrate the effectiveness of our approach in detecting Deepfake forged faces, outperforming the majority of existing methods.
摘要:
Images captured under improper exposure conditions lose their brightness information and texture details. Therefore, the enhancement of low-light images has received widespread attention. In recent years, most methods are based on deep convolutional neural networks to enhance low-light images in the spatial domain, which tends to introduce a huge number of parameters, thus limiting their practical applicability. In this paper, we propose a Fourier-based two-stage low-light image enhancement method via mutual learning (FT-LLIE), which sequentially enhance the amplitude and phase components. Specifically, we design the amplitude enhancement module (AEM) and phase enhancement module (PEM). In these two enhancement stages, we design the amplitude enhancement block (AEB) and phase enhancement block (PEB) based on the Fast Fourier Transform (FFT) to deal with the amplitude component and the phase component, respectively. In AEB and PEB, we design spatial unit (SU) and frequency unit (FU) to process spatial and frequency domain information, and adopt a mutual learning strategy so that the local features extracted from the spatial domain and global features extracted from the frequency domain can learn from each other to obtain complementary information to enhance the image. Through extensive experiments, it has been shown that our network requires only a small number of parameters to effectively enhance image details, outperforming existing low-light image enhancement algorithms in both qualitative and quantitative results.
Images captured under improper exposure conditions lose their brightness information and texture details. Therefore, the enhancement of low-light images has received widespread attention. In recent years, most methods are based on deep convolutional neural networks to enhance low-light images in the spatial domain, which tends to introduce a huge number of parameters, thus limiting their practical applicability. In this paper, we propose a Fourier-based two-stage low-light image enhancement method via mutual learning (FT-LLIE), which sequentially enhance the amplitude and phase components. Specifically, we design the amplitude enhancement module (AEM) and phase enhancement module (PEM). In these two enhancement stages, we design the amplitude enhancement block (AEB) and phase enhancement block (PEB) based on the Fast Fourier Transform (FFT) to deal with the amplitude component and the phase component, respectively. In AEB and PEB, we design spatial unit (SU) and frequency unit (FU) to process spatial and frequency domain information, and adopt a mutual learning strategy so that the local features extracted from the spatial domain and global features extracted from the frequency domain can learn from each other to obtain complementary information to enhance the image. Through extensive experiments, it has been shown that our network requires only a small number of parameters to effectively enhance image details, outperforming existing low-light image enhancement algorithms in both qualitative and quantitative results.
摘要:
Images captured in the wild often suffer from issues such as under-exposure, over-exposure, or sometimes a combination of both. These images tend to lose details and texture due to uneven exposure. The majority of image enhancement methods currently focus on correcting either under-exposure or over-exposure, but there are only a few methods available that can effectively handle these two problems simultaneously. In order to address these issues, a novel partition-based exposure correction method is proposed. Firstly, our method calculates the illumination map to generate a partition mask that divides the original image into under-exposed and over-exposed areas. Then, we propose a Transformer-based parameter estimation module to estimate the dual gamma values for partition-based exposure correction. Finally, we introduce a dual-branch fusion module to merge the original image with the exposure-corrected image to obtain the final result. It is worth noting that the illumination map plays a guiding role in both the dual gamma model parameters estimation and the dual-branch fusion. Extensive experiments demonstrate that the proposed method consistently achieves superior performance over state-of-the-art (SOTA) methods on 9 datasets with paired or unpaired samples. Our codes are available at https://github.com/csust7zhangjm/ExposureCorrectionWMS .
Images captured in the wild often suffer from issues such as under-exposure, over-exposure, or sometimes a combination of both. These images tend to lose details and texture due to uneven exposure. The majority of image enhancement methods currently focus on correcting either under-exposure or over-exposure, but there are only a few methods available that can effectively handle these two problems simultaneously. In order to address these issues, a novel partition-based exposure correction method is proposed. Firstly, our method calculates the illumination map to generate a partition mask that divides the original image into under-exposed and over-exposed areas. Then, we propose a Transformer-based parameter estimation module to estimate the dual gamma values for partition-based exposure correction. Finally, we introduce a dual-branch fusion module to merge the original image with the exposure-corrected image to obtain the final result. It is worth noting that the illumination map plays a guiding role in both the dual gamma model parameters estimation and the dual-branch fusion. Extensive experiments demonstrate that the proposed method consistently achieves superior performance over state-of-the-art (SOTA) methods on 9 datasets with paired or unpaired samples. Our codes are available at https://github.com/csust7zhangjm/ExposureCorrectionWMS .
作者机构:
[Guangxi Hu; Haimei Luo; Yuxian Zhang; Jiayu Zeng; Yanliang He; Xianping Wang; Wen Yuan] Jiangxi Provincial Key Laboratory of Advanced Electronic Materials and Devices (No. 2024SSY03011), Nanchang 330022, China;[Jiajia Zhao] School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, China
通讯机构:
[Haimei Luo] J;[Jiajia Zhao] S;Jiangxi Provincial Key Laboratory of Advanced Electronic Materials and Devices (No. 2024SSY03011), Nanchang 330022, China<&wdkj&>School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, China
摘要:
This paper proposes a sensor for simultaneously measuring temperature and strain using a microfiber Bragg grating (MFBG), half of which is coated with polymer polydimethylsiloxane (PDMS) and encapsulated inside a couple of U-shape glass grooves, while the other half is bare. Due to the different thermal-optic coefficient, elastic-optic coefficient, and cross section structural dimension, the two reflection bands from the encapsulated part of the MFBG and the bare MFBG section show different behaviors to temperature and strain. The sensitivity to temperature varies significantly between the two peaks, the reflection wavelength shifts in the opposite direction, capable of effectively detecting temperature variations. The strain is almost all concentrate on the bare MFBG (BMFBG) as the cross-sectional area of the encapsulated MFBG (EMFBG) is much larger than that of BMFBG. The experimental results show that the temperature sensitivities of −31.92 pm/°C and 10.31 pm/°C and strain sensitivities of ~0 pm/με and 6.24 pm/με are achieved, respectively. The sensor has the advantages of high sensitivity and simple structure and can measure strain and temperature simultaneously.
This paper proposes a sensor for simultaneously measuring temperature and strain using a microfiber Bragg grating (MFBG), half of which is coated with polymer polydimethylsiloxane (PDMS) and encapsulated inside a couple of U-shape glass grooves, while the other half is bare. Due to the different thermal-optic coefficient, elastic-optic coefficient, and cross section structural dimension, the two reflection bands from the encapsulated part of the MFBG and the bare MFBG section show different behaviors to temperature and strain. The sensitivity to temperature varies significantly between the two peaks, the reflection wavelength shifts in the opposite direction, capable of effectively detecting temperature variations. The strain is almost all concentrate on the bare MFBG (BMFBG) as the cross-sectional area of the encapsulated MFBG (EMFBG) is much larger than that of BMFBG. The experimental results show that the temperature sensitivities of −31.92 pm/°C and 10.31 pm/°C and strain sensitivities of ~0 pm/με and 6.24 pm/με are achieved, respectively. The sensor has the advantages of high sensitivity and simple structure and can measure strain and temperature simultaneously.
关键词:
Audio-guided video object segmentation;Referring video object segmentation;Expression-visual attention;Audio-text contrastive learning;Multi-task learning
摘要:
Audio-guided Video Object Segmentation (A-VOS) and Referring Video Object Segmentation (R-VOS) are two highly related tasks aiming to segment specific objects from video sequences according to expression prompts. However, due to the challenges of modeling representations for different modalities, existing methods struggle to balance between interaction flexibility and localization precision. In this paper, we address this problem from two perspectives: the alignment of audio and text and the deep interaction among audio, text, and visual modalities. First, we propose a universal architecture, the Expression Prompt Collaboration Transformer, herein EPCFormer. Next, we propose an Expression Alignment (EA) mechanism for audio and text. The proposed EPCFormer exploits the fact that audio and text prompts referring to the same objects are semantically equivalent by using contrastive learning for both expressions. Then, to facilitate deep interactions among audio, text, and visual modalities, we introduce an Expression-Visual Attention (EVA) module. The knowledge of video object segmentation in terms of the expression prompts can seamlessly transfer between the two tasks by deeply exploring complementary cues between text and audio. Experiments on well-recognized benchmarks demonstrate that our EPCFormer attains state-of-the-art results on both tasks.
Audio-guided Video Object Segmentation (A-VOS) and Referring Video Object Segmentation (R-VOS) are two highly related tasks aiming to segment specific objects from video sequences according to expression prompts. However, due to the challenges of modeling representations for different modalities, existing methods struggle to balance between interaction flexibility and localization precision. In this paper, we address this problem from two perspectives: the alignment of audio and text and the deep interaction among audio, text, and visual modalities. First, we propose a universal architecture, the Expression Prompt Collaboration Transformer, herein EPCFormer. Next, we propose an Expression Alignment (EA) mechanism for audio and text. The proposed EPCFormer exploits the fact that audio and text prompts referring to the same objects are semantically equivalent by using contrastive learning for both expressions. Then, to facilitate deep interactions among audio, text, and visual modalities, we introduce an Expression-Visual Attention (EVA) module. The knowledge of video object segmentation in terms of the expression prompts can seamlessly transfer between the two tasks by deeply exploring complementary cues between text and audio. Experiments on well-recognized benchmarks demonstrate that our EPCFormer attains state-of-the-art results on both tasks.
期刊:
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING,2025年12(2):623-635 ISSN:2327-4697
作者机构:
[Xiaopeng Fan] Hangzhou Institute of Advanced Technology, Hangzhou, China;[Shiming He] School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China;[Qixue Lin; Xiaocan Li; Kun Xie] College of Computer Science and Electronics Engineering and the Ministry of Education Key Laboratory of Supercomputing and Artiffcial Intelligence Fusion Computing, Hunan University, Changsha, China;[Quan Feng] Hunan Vanguard Group Corporation Ltd., Changsha, China;[Jigang Wen] School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China
摘要:
Comprehensive network monitoring data is crucial for anomaly detection and network optimization tasks. However, due to factors such as sampling strategies and failures in data transmission or storage, only incomplete monitoring data can be obtained. Traditional techniques for completing network monitoring data matrices have limitations in leveraging network-related features and lack the adaptability required for offline and online execution. In this paper, we introduce a novel approach that significantly improves the integration of network features and operational flexibility in data completion tasks. By converting the data matrix into a bipartite graph and integrating network features into the graph's node attributes, we redefine the problem of missing data completion. This transformation reframes the issue as estimating unobserved edges in the bipartite graph. We propose the Bi-directional Bipartite Graph Completion (BGC) model, a flexible framework that seamlessly adapts to both offline and online data completion tasks. This model encapsulates static, dynamic, bi-directional temporal features and network topology, thereby improving the accuracy of unobserved edge estimation. Experiments conducted on two public data traces demonstrate the superiority of our method over six baseline models. Our method not only achieves higher accuracy in offline scenarios but also displays remarkable speed in online situations.
摘要:
Zeroing neural network (ZNN) is an effective tool for solving the synchronization problem of chaotic systems and has received widespread attention. However, previous ZNN models only focused on the synchronization problem of real chaotic systems, failing to tackle the synchronization problem of complex chaotic systems. Based on this, this paper proposes a novel complex-valued time varying zeroing neural network (CVTVZNN) model. This paper rigorously derives the fixed-time convergence and external noise suppression ability of the CVTVZNN model in solving the synchronization problem of complex chaotic systems. Through the results of three numerical experiments on the synchronization of complex Chen chaotic system, complex autonomous chaotic system and complex dynamos chaotic system, verified that the CVTVZNN model has faster convergence speed and higher accuracy in suppressing disturbances compared to other existing ZNN models. It is worth noting that the experimental results show that the CVTVZNN model only takes about 0.00115 s to complete the synchronization task, whether in the noise-free or in environments with external noise. However, in the same experimental environment, other models either cannot achieve synchronization at all, or require at least 0.217 s, which is at least 188 times the synchronization time achieved by the CVTVZNN model. Furthermore, its practical application value has been further demonstrated through its implementation on field programmable gate array.
摘要:
Recently, numerous degraded images have flooded search engines and social networks, finding extensive and practical applications in the real world. However, these images have also posed new challenges to conventional image retrieval tasks. To this end, we introduce a new task of retrieving degraded images through deep hashing from large-scale databases, and further present the Locality-Sensitive Hashing Network (LSHNet) to tackle it in a self-supervised manner. More specifically, we first propose a triplet strategy to enable the self-supervised training of LSHNet in an end-to-end fashion. Due to the designed strategy, the highly semantic similarity and discrimination of degraded images are well-preserved in our learned latent codes without requiring additional human labor in labeling tons of degraded images. Moreover, to tackle large-scale image retrieval efficiently, we further propose to transform the latent codes into locality-sensitive hashing codes such that the degraded images can be retrieved in sublinear time with their representation ability almost unaffected. Extensive experiments are conducted on three public benchmarks where the results demonstrate the superior performance of LSHNet in retrieving similar images under degraded conditions.
作者:
Jianming Zhang*;Dianwen Li;Zhigao Zeng;Rui Zhang;Jin Wang
期刊:
Engineering Applications of Artificial Intelligence,2025年150:110536 ISSN:0952-1976
通讯作者:
Jianming Zhang
作者机构:
Key Laboratory of Safety Control of Bridge Engineering, Ministry of Education (Changsha University of Science and Technology), Changsha, 410114, China;[Zhigao Zeng] School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, 410114, China;[Rui Zhang] National Engineering Research Center of Highway Maintenance Technology, Changsha University of Science and Technology, Changsha, 410114, China;[Jin Wang] Sanya Institute, Hunan University of Science and Technology, Sanya, 572024, China;[Jianming Zhang; Dianwen Li] Key Laboratory of Safety Control of Bridge Engineering, Ministry of Education (Changsha University of Science and Technology), Changsha, 410114, China<&wdkj&>School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, 410114, China
通讯机构:
[Jianming Zhang] K;Key Laboratory of Safety Control of Bridge Engineering, Ministry of Education (Changsha University of Science and Technology), Changsha, 410114, China<&wdkj&>School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, 410114, China
摘要:
Cracks are one of the most common pavement diseases. If not promptly repaired, they will hasten the deterioration of the road. Semantic segmentation is the most convenient pavement crack detection method to assess the damage level. Convolutional neural networks (CNN) excel at extracting local spatial information, but they have limitations in capturing global contextual information. Therefore, a dual-branch crack segmentation network (DBCNet) with Mamba and multi-shape convolutional kernels is proposed. First, a dual-branch encoder is employed to extract both spatial and contextual information, consisting of the spatial branch and the context branch. The cross-like block (CrossBlock) that excels in extracting spatial information horizontally and vertically from cracks is proposed. Multiple CrossBlocks are stacked to construct a lightweight network as a spatial branch. The improved Visual State Space Model (VMamba) serves as a context branch for modeling long-range dependencies for more accurate pixel-by-pixel segmentation. Second, the Feature Fusion Module (FFM), based on squeeze-and-excitation attention, is constructed to dynamically fuse the features from the two branches layer by layer. Third, a Cross-aware Mamba Module (CMM) with the hybrid CNN-Mamba architecture is proposed to compose the decoder. Fourth, comprehensive evaluations were conducted on three public datasets. Performs on multiple metrics achieved considerable progress, outperforming the seven state-of-the-art models. The mean intersection over union (mIoU) on Deepcrack, CrackTree 260, and CFD reached 87.87%, 85.34%, and 81.35%, respectively. Code and data will be available at https://github.com/name191/DBCNet .
Cracks are one of the most common pavement diseases. If not promptly repaired, they will hasten the deterioration of the road. Semantic segmentation is the most convenient pavement crack detection method to assess the damage level. Convolutional neural networks (CNN) excel at extracting local spatial information, but they have limitations in capturing global contextual information. Therefore, a dual-branch crack segmentation network (DBCNet) with Mamba and multi-shape convolutional kernels is proposed. First, a dual-branch encoder is employed to extract both spatial and contextual information, consisting of the spatial branch and the context branch. The cross-like block (CrossBlock) that excels in extracting spatial information horizontally and vertically from cracks is proposed. Multiple CrossBlocks are stacked to construct a lightweight network as a spatial branch. The improved Visual State Space Model (VMamba) serves as a context branch for modeling long-range dependencies for more accurate pixel-by-pixel segmentation. Second, the Feature Fusion Module (FFM), based on squeeze-and-excitation attention, is constructed to dynamically fuse the features from the two branches layer by layer. Third, a Cross-aware Mamba Module (CMM) with the hybrid CNN-Mamba architecture is proposed to compose the decoder. Fourth, comprehensive evaluations were conducted on three public datasets. Performs on multiple metrics achieved considerable progress, outperforming the seven state-of-the-art models. The mean intersection over union (mIoU) on Deepcrack, CrackTree 260, and CFD reached 87.87%, 85.34%, and 81.35%, respectively. Code and data will be available at https://github.com/name191/DBCNet .
摘要:
With the emergence of advanced backdoor defense methods, the success rate of backdoor attacks in Deep Neural Networks (DNNs) has dramatically decreased. This situation may lead to overconfidence in existing backdoor defense methods. In view of this, we propose an adversarial distillation strategy combined with a Gaussian reinforcement mechanism (ADGR) for enhancing the resilience of backdoor attacks in the face of defenses. This strategy utilizes a backdoor reinforcement based on Gaussian blur to enhance the ability to counter the backdoor defense. Moreover, we design adversarial distillation strategies to improve the consistency between the backdoor model and the output of the clean model, which further strengthens the ability of a backdoor attack against the defense. Extensive experiments on the CIFAR-10, GTSRB, CIFAR-100, and Tiny-ImageNet datasets show that ADGR’s attack effect is improved by an average of about 5% compared to 8 mainstream backdoor attacks. In addition, after ADGR passes through 11 different backdoor defenses, its attack effect still remains an average of more than 90%. The code is available at: https://github.com/hubin111/ADGR .
With the emergence of advanced backdoor defense methods, the success rate of backdoor attacks in Deep Neural Networks (DNNs) has dramatically decreased. This situation may lead to overconfidence in existing backdoor defense methods. In view of this, we propose an adversarial distillation strategy combined with a Gaussian reinforcement mechanism (ADGR) for enhancing the resilience of backdoor attacks in the face of defenses. This strategy utilizes a backdoor reinforcement based on Gaussian blur to enhance the ability to counter the backdoor defense. Moreover, we design adversarial distillation strategies to improve the consistency between the backdoor model and the output of the clean model, which further strengthens the ability of a backdoor attack against the defense. Extensive experiments on the CIFAR-10, GTSRB, CIFAR-100, and Tiny-ImageNet datasets show that ADGR’s attack effect is improved by an average of about 5% compared to 8 mainstream backdoor attacks. In addition, after ADGR passes through 11 different backdoor defenses, its attack effect still remains an average of more than 90%. The code is available at: https://github.com/hubin111/ADGR .
作者:
Yuan Chen;Chongju Zhong;Pinyi Huang;Wangyang Cai;Lei Wang
作者机构:
[Wangyang Cai] School of Computer and Communication Engineering, Changsha University of Science and Technology;[Yuan Chen; Chongju Zhong; Pinyi Huang; Lei Wang] School of Computer Science and Engineering, Central South University
会议名称:
ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
会议时间:
06 April 2025
会议地点:
Hyderabad, India
会议论文集名称:
ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
摘要:
Micro-expression (ME) recognition holds great potential for revealing true human emotions. A significant barrier to effective ME recognition is the lack of sufficient annotated ME video data because MEs are subtle and involuntary facial expressions that are very hard to capture. To address this issue, data augmentation techniques, such as ME migration based on a driven video, have been employed to enrich training samples. Considering that MEs can be complex facial movements involving multiple action unit (AU) changes, we propose a novel ME generation approach that enables the creation of more realistic facial sequences by fusing MEs from multiple videos rather than just single driven video. To enhance the effectiveness of multi-sequence ME transfer, we adapt the thin plate spline motion model and improve traditional face alignment methods to better suit the model, facilitating multi-sequence driven ME generation. In our experiments, we conduct a downstream ME recognition task using models trained on our augmented ME sequences to demonstrate the effectiveness of our approach on the SAMM, SMIC, and CASME II datasets. The results confirm that our proposed approach outperforms state-of-the-art (SOTA) augmentation and generation methods in terms of F1 score and recognition accuracy.
摘要:
Image inpainting has achieved a superior performance boost in inpainting quality with transformers, because of their powerful long-dependency modeling capacity. However, due to quadratically increased computation complexity with spatial resolution, transformers are not suitable for high-resolution image inpainting tasks, especially when attempting to model structure and texture separately. It remains challenging to accurately and reasonably recover the global structures and texture details while maintaining competitive inference efficiency. In this paper, we propose a novel and efficient method for high-resolution image inpainting using a lightweight structure-guided network, which includes a high-resolution structure restoration (HRSR) sub-network and a structure enhanced texture restoration (SETR) sub-network. Specifically, we introduce an efficient transformer block into the HRSR network, which combines recursive gated convolution with a hybrid attention mechanism to enhance the dependency modeling and effectively merge the information from both known and masked regions, allowing for global structure reconstruction at high resolution and high performance. Additionally, we introduce a global–local gated transformer block into the SETR, which employs the gating mechanism to transfers useful global and local features and uses the structural prior information to guide the texture restoration process for the final inpainting of images. We conduct extensive experiments on several benchmark datasets to demonstrate the effectiveness of our LSG-Net. The results show that our method achieves a new state-of-the-art performance on the high-resolution image inpainting task, for example, with the PSNR of 25.68 and SSIM of 0.930 on the Indoor dataset, while significantly reducing the number of parameters (268.7 M) and computational complexity. Furthermore, we demonstrate our method on various applications. Our code and models are available at https://github.com/LYaNing-LSG/LSG-Net.git .
Image inpainting has achieved a superior performance boost in inpainting quality with transformers, because of their powerful long-dependency modeling capacity. However, due to quadratically increased computation complexity with spatial resolution, transformers are not suitable for high-resolution image inpainting tasks, especially when attempting to model structure and texture separately. It remains challenging to accurately and reasonably recover the global structures and texture details while maintaining competitive inference efficiency. In this paper, we propose a novel and efficient method for high-resolution image inpainting using a lightweight structure-guided network, which includes a high-resolution structure restoration (HRSR) sub-network and a structure enhanced texture restoration (SETR) sub-network. Specifically, we introduce an efficient transformer block into the HRSR network, which combines recursive gated convolution with a hybrid attention mechanism to enhance the dependency modeling and effectively merge the information from both known and masked regions, allowing for global structure reconstruction at high resolution and high performance. Additionally, we introduce a global–local gated transformer block into the SETR, which employs the gating mechanism to transfers useful global and local features and uses the structural prior information to guide the texture restoration process for the final inpainting of images. We conduct extensive experiments on several benchmark datasets to demonstrate the effectiveness of our LSG-Net. The results show that our method achieves a new state-of-the-art performance on the high-resolution image inpainting task, for example, with the PSNR of 25.68 and SSIM of 0.930 on the Indoor dataset, while significantly reducing the number of parameters (268.7 M) and computational complexity. Furthermore, we demonstrate our method on various applications. Our code and models are available at https://github.com/LYaNing-LSG/LSG-Net.git .
摘要:
In recent years, synthetic aperture radar (SAR) ship detection has seen significant improvements due to the rapid development of deep learning. However, when ship targets are densely arranged or exhibit multiscale variations, there are still issues such as significant differences in aspect ratios, resulting in false alarms, missed detections, and low detection accuracy. To overcome these challenges, this letter introduces a novel detection model, PEGNet, based on Faster R-CNN. First, to identify ship targets at different scales, the path aggregation feature pyramid network (PAFPN) was integrated into the feature fusion structure, which enhances the network's feature representation and robustness. Second, efficient multiscale attention (EMA) was employed to strengthen detection accuracy by reducing noise interference and enhancing feature stability. Third, the guided anchoring region proposal network (GA-RPN) was introduced to produce anchors that more accurately reflect the actual positions and scales of targets, which improves localization precision and lowers the missed detection rate. The performance of PEGNet was tested on the SSDD and high-resolution SAR images dataset (HRSID) datasets, achieving mAP scores of 71.1% and 67.9%, respectively. Compared to the baseline network, this represents improvements of 2.5% and 7.6%. This result highlights the method's superior performance compared to other approaches.
作者机构:
[Xig, Jiaojiao; Li, Wenjun; Ma, Wanjun; Peng, Huan] Changsha Univ Sci & Technol, Hunan Prov Key Lab Intelligent Proc Big Data Tran, Changsha, Hunan, Peoples R China.;[Liang, Weijun] Univ South China, Affiliated Changsha Cent Hosp, Hengyang Med Sch, Changsha, Hunan, Peoples R China.
会议名称:
7th Chinese Conference on Pattern Recognition and Computer Vision
会议时间:
OCT 18-20, 2024
会议地点:
Urumqi, PEOPLES R CHINA
会议主办单位:
[Li, Wenjun;Xig, Jiaojiao;Peng, Huan;Ma, Wanjun] Changsha Univ Sci & Technol, Hunan Prov Key Lab Intelligent Proc Big Data Tran, Changsha, Hunan, Peoples R China.^[Liang, Weijun] Univ South China, Affiliated Changsha Cent Hosp, Hengyang Med Sch, Changsha, Hunan, Peoples R China.
会议论文集名称:
Lecture Notes in Computer Science
关键词:
Embedded Deep Learning;Rifampicin;Drug Resistance;Tuberculosis;CT Images;Diagnostic Application
摘要:
In the treatment of tuberculosis (TB), drug-resistant tuberculosis arises when Mycobacterium tuberculosis undergoes genetic mutations or acquires resistance through horizontal gene transfer. Identifying the treatment response of TB patients to Rifampicin, a principal medication for TB treatment, is essential for healthcare professionals to make timely and accurate diagnoses. Not only can this approach save on the costs and duration of TB treatment, but it also helps prevent the disease's spread and fatalities. Traditional methods for diagnosing Rifampicin-resistant TB involve molecular biology tests and drug susceptibility testing, which are time-consuming, expensive, and labor-intensive. To assist physicians in diagnosing the treatment response of TB patients to Rifampicin more rapidly and efficiently, this study introduces a computer-aided diagnostic algorithm based on Embedded Deep Learning (EDL). Initially, CT images from target patients at two imaging centers were collected. The classifier model used in this research combines image preprocessing techniques, three convolutional neural networks, and decision fusion technology to enhance the model's classification efficiency and reduce overfitting. Additionally, the Grad-CAM model was utilized for visualizing the areas of lesions. In the test sets from both centers, the Embedded Deep Learning Model (EDL Model) demonstrated superior performance over other models by combining hard voting or soft voting mechanisms, with an average accuracy improvement of 3.16-16.87%, AUC increase of 3.05-12.66%, and F1-score enhancement of 6.38-22.49%. The diagnostic tool developed in this research for assisting in the diagnosis of TB patients' response to Rifampicin treatment has significant clinical potential, particularly in settings lacking specialized radiological expertise.
通讯机构:
[Yin, B ] C;Changsha Univ Sci & Technol, Sch Comp & Commun Engn, Changsha, Peoples R China.
关键词:
Blockchain;sharding;accounts;transactions
摘要:
Sharding is a promising technique for increasing a blockchain system's throughput by enabling parallel transaction processing. The main challenge of state sharding lies in ensuring the atomicity verification of cross-sharding transactions, which results in double communication overhead and increases the transaction's confirmation time. Previous research has primarily focused on developing cross-shard protocols for the fast and reliable validation of transactions involving multiple shards. These studies typically generate a large number of cross-shard transactions because they primarily use simple address mapping for state sharding, that is, the prefix/suffix of the account address. In this article, we propose a state sharding scheme via density-based partitioning of the account-transaction graph. In order to reduce cross-shard transactions, the scheme groups correlated accounts into the same shard by generating the densest subgraphs, as the graph density describes the correlation among accounts, i.e., how often transactions have occurred among accounts. We formulate the graph density-based state sharding problem, with the goal of maximizing the average density across all shards under the workload constraint. We prove the NP-completeness of the problem. To reduce the complexity of finding the densest subgraph, we propose the pruning-based algorithm that reduces the search space by pre-pruning some invalid edges based on the concept of core number. We also extend the linear deterministic greedy algorithm and PageRank algorithm to handle new transactions in the dynamic scenario. We conduct extensive experiments using real transaction data from Ethereum. The experimental results demonstrate a strong correlation between the shard density and the number of cross-shard transactions, and the pruning-based algorithm can reduce the running time by an order of magnitude.
期刊:
Electric Power Systems Research,2025年244:111556 ISSN:0378-7796
通讯作者:
Jin, K
作者机构:
[Zhu, Min] Zhejiang Shuren Univ, Coll Informat Sci & Technol, Hangzhou, Peoples R China.;[Rasheed, Rassol Hamed] Warith Al Anbiyaa Univ, Fac Engn, Air Conditioning Engn Dept, Karbala, Iraq.;[Albahadly, Emad jassim kassed] Minist Elect, Directorate Gen Elect Distribut South, Baghdad, Iraq.;[Zhang, Jingyu] Changsha Univ Sci & Technol, Sch Comp & Commun Engn, Changsha 410004, Peoples R China.;[Alqahtani, Fayez] King Saud Univ, Coll Comp & Informat Sci, Software Engn Dept, Riyadh 12372, Saudi Arabia.
通讯机构:
[Jin, K ] H;Hunan Univ Sci & Technol, Sanya Res Inst, Sanya 572000, Peoples R China.
关键词:
Fixed battery energy storage;Mobile battery energy storage;Flexibility;Two-way active distribution network;Robust optimization
摘要:
The use of flexibilities in the electricity distribution network is aimed at achieving more optimal operation of this network. One of the methods of using flexibility is using energy storage systems. In the operation of the distribution network with variable tariff, energy storage systems create flexibility in the network by charging in off-peak hours and discharging in peak hours. Batteries, which are the most widely used storage systems in the electricity distribution network for the operation of this network, are divided into two categories: fixed and mobile batteries. In fixed batteries, the installation location of the battery is already known, but in mobile batteries, the battery is transported on a truck and its location changes according to the operation, which is spatial-temporal flexibility of mobile batteries, and their location is determined according to the parking lots created to stop the truck. In this paper, the formulation of two-way distribution network operation in the presence of fixed and mobile batteries is presented. In the presented formulation, firstly, the model of fixed and mobile batteries is presented, and then it is aggregated in the problem of operation of two-way distribution network. The spatial-temporal flexibility of a mobile battery is a feature that allows for various applications. Unlike a fixed battery that only has temporal flexibility, in a mobile battery both temporal and spatial flexibility exist and are formulated. In the presented model, the robust optimization method is used to model the uncertainties of the problem. Finally, simulations on the IEEE 33-bus network have been presented to prove the capability of the presented model and the flexibility used. Considering 4 general states without fixed and mobile batteries, despite the fixed battery, despite the mobile battery and despite simultaneous both types of fixed and mobile batteries, in 3 cases, feeding from bus 1, feeding from bus 33 and feeding from both sides, the indices of the objective function value of the problem, the apparent power peak of the network and the active power losses have been investigated, which shows the effectiveness of using the flexibility mentioned in this paper. The value of the objective function, active power losses, and the apparent power peak of the network by using both types of storage devices in the network from both sides of supply has been reduced by 12, 70, and 13 percent, respectively, compared to the case where none of the storage devices were used.
The use of flexibilities in the electricity distribution network is aimed at achieving more optimal operation of this network. One of the methods of using flexibility is using energy storage systems. In the operation of the distribution network with variable tariff, energy storage systems create flexibility in the network by charging in off-peak hours and discharging in peak hours. Batteries, which are the most widely used storage systems in the electricity distribution network for the operation of this network, are divided into two categories: fixed and mobile batteries. In fixed batteries, the installation location of the battery is already known, but in mobile batteries, the battery is transported on a truck and its location changes according to the operation, which is spatial-temporal flexibility of mobile batteries, and their location is determined according to the parking lots created to stop the truck. In this paper, the formulation of two-way distribution network operation in the presence of fixed and mobile batteries is presented. In the presented formulation, firstly, the model of fixed and mobile batteries is presented, and then it is aggregated in the problem of operation of two-way distribution network. The spatial-temporal flexibility of a mobile battery is a feature that allows for various applications. Unlike a fixed battery that only has temporal flexibility, in a mobile battery both temporal and spatial flexibility exist and are formulated. In the presented model, the robust optimization method is used to model the uncertainties of the problem. Finally, simulations on the IEEE 33-bus network have been presented to prove the capability of the presented model and the flexibility used. Considering 4 general states without fixed and mobile batteries, despite the fixed battery, despite the mobile battery and despite simultaneous both types of fixed and mobile batteries, in 3 cases, feeding from bus 1, feeding from bus 33 and feeding from both sides, the indices of the objective function value of the problem, the apparent power peak of the network and the active power losses have been investigated, which shows the effectiveness of using the flexibility mentioned in this paper. The value of the objective function, active power losses, and the apparent power peak of the network by using both types of storage devices in the network from both sides of supply has been reduced by 12, 70, and 13 percent, respectively, compared to the case where none of the storage devices were used.
摘要:
In order to solve the problem of weak single domain generalization ability in existing crowd counting methods, this study proposes a new crowd counting framework called Multi-scale Attention and Hierarchy level Enhancement (MAHE). Firstly, the model can focus on both the detailed features and the macro information of structural position changes through the fusion of channel attention and spatial attention. Secondly, the addition of multi-head attention feature module facilitates the model's capacity to effectively capture complex dependency relationships between sequence elements. In addition, the three-stage encoding and decoding processing mode enables the model to effectively represent crowd density information. Finally, the fusion of multi-scale features derived from different receptive fields is further enhanced through multi-scale hierarchy level feature fusion, thereby enabling the model to learn high-level semantic information and low-level multi-scale visual field feature information. This method enhances the model's capacity to capture key feature information, even in highly differentiated datasets, thereby improving the model's generalization ability on a single domain. The model has demonstrated strong generalization capabilities through extensive experiments on different datasets. This study not only improves the accuracy of crowd counting, but also introduces a new research approach for single domain generalization of crowd counting.
摘要:
Due to the insufficient visual security of encrypted images generated by conventional image encryption algorithms, it is easily recognized and decrypted or attacked by attackers in public channels. To solve this problem, this paper proposes a visually meaningful image encryption (VMIE) method based on a new chaotic map to improve the encryption complexity and unpredictability. Moreover, we design a two-way intertwine scrambling and deep embedding algorithm to protect the content of the image, and the encrypted image also has visual security. Firstly, a new one-dimensional chaotic map combining sine and tangent functions is designed to construct the measurement matrix, and then a new two-way intertwine scrambling algorithm is used to scramble the sparse matrix of the encrypted image. Secondly, the chaotic system generates measurement matrix and diffusion matrix for compressing and diffusing the scrambled image. Finally, a new embedding strategy is adopted to retain more information of the plain image and reduce information loss. The experimental results show that the average PSNR of the encrypted image is 38.96dB and that of the decrypted image is 34.59dB. Compared with the existing schemes, this algorithm has better visual quality and reconstruction quality.