A 2 Million Parameter Depoising Model and a Poison Detection Model
Code: https://github.com/livinginparadise/GRDDenoiser
This work introduces a novel, highly efficient denoising model featuring a compact architecture of 2 million parameters. Furthermore, a dedicated "Predictor" for poison detection of an image.
The denoiser architecture prioritizes bias-free operation and responsiveness to varying noise levels.
Here is how it works:
Gaussian Prior Extraction: A ResidualPriorExtractor is to extract a Gaussian prior, utilizing fixed kernels to separate high-frequency details (edges and noise) from the smooth background. This process provides initial information, highlighting regions susceptible to poisoning.
Noise Conditioning: The model incorporates a NoiseConditioner that projects the noise level (sigma) and a content descriptor into an embedding, modulating network layers. This adaptive approach adjusts the model's sensitivity based on image noise.
Bias-Free Design: All convolutional layers utilize a bias=False configuration, promoting reliance on feature data and normalization (LayerNorm2d), which is known to enhance generalization in restoration tasks.
Gated Residual Blocks: The core architecture utilizes Global Gating within Residual Blocks. A gating value (ranging from 0 to 1), derived from the global mean of features, selectively regulates information flow.
The "Predictor" model, so named for its predictive capabilities, simultaneously classifies images as either poisoned or safe and predicts a noise mask indicating the location of the poison.
The model employs a GhostResidualDecomposition-Net architecture to achieve high accuracy:
Backbone (ResNet + SE): The encoder utilizes Residual Blocks enhanced with Squeeze-and-Excitation (SE) Blocks, enabling channel attention and weighting of important feature maps.
ASPP (Atrous Spatial Pyramid Pooling): Located at the bottleneck, this module employs dilated convolutions to capture contextual information at multiple scales, facilitating the detection of both fine noise patterns and global image structure.
Attention Gates in the Decoder: Attention Gates are incorporated within the skip connections during image upsampling, filtering features to focus on poisoned regions.
"Ghost Loss" Function
Training of the predictor model utilizes a custom loss function, "Ghost Loss", to ensure the generation of realistic reconstructions. This function combines four distinct penalties:
Pixel-wise Noise Match: Measures the agreement between the predicted noise mask and the ground truth poison.
Restoration Match (MSE): Evaluates the similarity between the restored image (after mask subtraction) and the original clean image.
Binary Classification (BCE): Assesses the accuracy of the poison/safe classification.
Semantic Anchor (Perceptual Loss): A frozen VGG16 network is used to compare features of the restored image with those of the clean image, preventing excessive blurring and preserving important details.