(W34M11) Passive–Blind Forgery Detection Techniques - Part 1

3.3 Passive–Blind Forgery Detection Techniques

Forensic features used by various image and video tamper detection techniques that we are about to examine are categorized according to the scheme presented in Figure 3.2.

* A brief description of various active non-blind tamper detection solutions can be found in Appendix D.

Figure 3.2 Categorization of forensic features and analyses used for the detection of different kinds of forgeries in digital images and videos.

3.3.1 Detection of Frame–Insertion/Removal/Replication

In the upcoming sub–sections, we examine the various forensic artifacts that help detect the presence of frame–insertion, frame–removal, and frame–replication forgeries.

3.3.1.1 Sensor Artifact Based Techniques

Image/video acquisition devices generally leave unique identifiable traces (mainly noise) in the content they record. This noise is the result of the inhomogeneity of the imaging sensors (such as charge coupled device (CCD) sensors or complementary metal oxide semiconductor (CMOS) sensors) used in the recording device. These inhomogeneities arise due to different sensitivities of the silicon wafers present in the imaging sensors. Noise artifacts are mainly used during source camera identification, but some researchers have shown them to be utile for the task of tamper detection as well. Sensor artifact (or sensor noise) based tamper detection techniques basically operate by determining if all the frames of the given video were captured by the same camera, depending on whether or not they exhibit identical noise patterns.

Read Out Noise: Digital cameras introduce read out noise in every frame they record, and this noise follows a consistent pattern over the entire video sequence. In case a video is suffering from frame–insertion forgery (where the frames being inserted were captured using a different camera), variations in the noise patterns between the frames can be used as an evidence of forgery. This scheme was suggested in De et al. 2006.

Sensor Pattern Noise: SPN is another unique noise artifact that can be used for the detection of inter–frame forgeries. By comparing reference SPN of the source camera with test SPN estimated from the given video, frame–replication and frame–insertion forgeries can be detected; in case of a frame–replicated video, reference and test SPN are highly correlated (because of the presence of identical content in the replicated sets of frames), whereas in case of a frame–inserted video, correlation between reference and test SPN is quite low (on account of presence of foreign frames, i.e., frames that originated from a device other than the one used to capture the given video). This method was suggested in Mondaini et al. 2007.

Photon Shot Noise: Presence of foreign frames can also be detected by examining variations in the photon shot noise patterns in different frames of the video in question, as suggested in Kobayashi et al. 2010.

3.3.1.2 Recompression Detection Based Techniques

This category of forensic techniques draws inspiration from the chain of events involved in the creation of a typical inter–frame forgery, according to which, any alteration in the pre–established sequence of frames in the given (compressed) video requires the forger to first extract individual frames from the video, then perform the forgery (say frame–removal), and finally combine the altered series of frames to create a new video. This final step is essentially a re–encoding process, which involves a certain amount of compression. This compression is actually a recompression since it is in addition to the initial compression that the video underwent when it was first created. Since recompression is an unavoidable consequence of tampering in the compressed domain, it can be treated as an evidence of post–production content modification. Some of the earliest known inter–frame tamper detection schemes detected forgeries in compressed videos by detecting the presence of recompression artifacts, which manifest in detectable forms in certain content features, such as the ones discussed below.

Discrete Cosine Transform (DCT) Coefficients: When a compressed video undergoes frame–insertion or frame–removal followed by recompression, its pre–established GOP structure gets desynchronized (GOP structure refers to the arrangement of I, P, and B–frames in a compressed video), which causes the DCT coefficients of the video frames to diverge from their usual distribution patterns. Presence of such deviations serves as an evidence of recompression, and subsequently, of inter–frame tampering.

Consider Figure 3.3, which demonstrates how the process of frame–removal desynchronizes the pre–established GOP structure of a compressed video. Figure 3.4 illustrates the effect of this desynchronization on the DCT coefficients of a recompressed video frame. For the sake of simplicity, this example demonstrates the effect of removal of three frames. (Note that, in reality, deletion of a mere three frames will not accomplish anything meaningful, considering the fact that deletion of even one second of a video recorded at 25 or 30 FPS would require at least 25–30 frames to be deleted.)

Figure 3.3 GOP Desynchronization. The top row depicts the GOP structure of an original MPEG video. The middle row demonstrates how the frames move from one GOP sequence to another after three frames are removed. The third row depicts the new, re-ordered GOP structure of the tampered video. This GOP re-arrangement gives rise to certain artifacts that can be used as evidence of presence of recompression. [Image Courtesy of Wang and Farid 2006]

Figure 3.4 Comb-Like Artifacts of Double Compression. These figures depict histograms of DCT coefficients of (a) singly-compressed frame and (b) a double-compressed version of this frame. Note the periodicities and the empty bins in the histogram of double-compressed frames; these are the forensic artifacts that lead to the detection of recompression. [Images Courtesy of Wang and Farid 2006]

DCT coefficient irregularities have been used for detecting frame–insertion/removal forgeries in Wang and Farid 2006, Wang and Farid 2009, Su and Xu 2010, Su et al. 2011, Milani et al. 2012, Xu et al. 2012, and Sun et al. 2012.

Block Artifact Strength: Block–based lossy video encoding schemes, such as MPEG, quantize coefficients in neighboring pixel blocks of a frame separately, which introduces visible artifacts along the block boundaries, known as block artifacts, blocking artifacts, or block boundary artifacts (see Figure 3.5).

Figure 3.5 Block Boundary Artifacts. While these artifacts are quite prominent in the entire frame in (a), they are more readily visible in the background area in the frame in (b). [Original Videos Courtesy of ITV, BBC One]

According to Luo et al. 2008, block artifacts persist even after the video is recompressed after undergoing frame–removal. Therefore, by detecting (and quantifying) the degree of change in the Block Artifact Strength (BAS) of the recompressed video as compared to when the video was singly–compressed, presence of frame–removal forgeries can be detected.

Variation of Prediction Footprint: The authors in Vázquez–Padín et al. 2012 proposed a novel and robust forensic feature called Variation of Prediction Footprint (VPF), which demonstrably appears in only those P–frames of recompressed videos that were I–frames when this video was first compressed.

Figure 3.6 presents an example to demonstrate VPF artifacts. In this figure, the red blocks indicate ‘intra–coded macroblocks (I–MBs)’, i.e., those MBs that are predicted from the MBs present in the same frame, the blue blocks indicate ‘inter–coded macroblocks (P–MBs)’, i.e., those MBs that are predicted from the MBs present in the neighboring frames, and the green blocks indicate ‘skipped macroblocks (S–MBs)’, i.e., those MBs that are not predicted but deduced from the previously decoded MBs.

Figure 3.6 VPF Artifacts. These are three successive frames from a double-compressed video, where (a) and (c) were P-frame when the videos was singly-compressed and remained P-frames when the video was recompressed. Frame depicted in (b) however was an I-frame in the original video but became a P-frame after recompression. The increased presence of red MBs (I-MBs) in (b) as compared to (a) and (c) is referred to as VPF, and is considered to be indicative of presence of recompression. [Images Courtesy of Vázquez-Padín et al. 2012]

VPF was used for the detection of recompression and inter–frame tampering in Labartino et al. 2013, Gironi et al. 2014, and Sitara and Mehtre 2017.

3.3.1.3 Motion and Brightness Features Based Techniques

Although recompression artifacts can often provide valuable clues regarding the processing history of the video in question, to state that their presence is always indicative of tampering would be rather unwise, simply because perfectly authentic digital content that we come across in our day to day lives has often undergone multiple post–production compressions. All the tamper detection techniques we reviewed in the previous section relied much too heavily on the forensic evidence offered by recompression artifacts, and consequently failed to differentiate tampered videos from harmlessly recompressed ones.

Since recompression artifacts do not offer conclusive evidence of tampering2 in realistic forgery scenarios, alternative forensic artifacts have to be relied upon; these are discussed next.

Motion Compensated Edge Artifacts (MCEA): MCEAs are unique artifacts that have been shown to appear in videos that are compressed with the help of coding algorithms that use block–based motion–compensated frame prediction (such as MPEG–2). During motion–compensated frame prediction, successive video frames are decoded with the help of previously decoded frames, which causes consecutive video frames to become dependent on one another. Inter–frame forgeries destroy such dependencies or correlations, thereby causing the existent block boundary artifacts in the video frames to become even more prominent. This increase in block boundary artifacts, which is referred to as MCEA, can help detect the presence of

* Although presence of recompression is not always indicative of post–production manipulation, its absence is a strong indicator of content fidelity, primarily because for a forger to manipulate the contents of an (undoubtedly compressed) image or video without introducing a second, however small, amount of recompression would be next to impossible. That said, ‘removing’ traces of a previous compression is an entirely different matter and we will learn more about that in Module 3.

inter–frame forgeries. Figure 3.7 demonstrates the effect of frame–removal followed by recompression on MCEA energy.

Figure 3.7 MCEAs. These figures depict MCEA energy spectra for (a) an authentic video and (b) the same video after it was recompressed following frame-removal. The characteristic peaks in the spectrum in (b) are considered to be the evidence of manipulation. [Images Courtesy of Dong et al. 2012]

MCEAs have been used as discriminating features for the detection of videos suffering from frame–removal forgeries in Su et al. 2009 and Dong et al. 2012.

Motion/Prediction Residue: At the time of its creation, a video exhibits an enormous amount of temporal redundancy, which the video encoder takes advantage of by predicting certain frames from other frames. Prediction residual, or prediction error, which is the difference between the original video frame and frame predicted by the video encoder, follows a particular pattern for a series of consecutive frames in an authentic video. These patterns are established when the video is first acquired, compressed, and saved. If this video is subjected to post–production tampering that affects its frame sequence in any way, the pre–established residual patterns get disrupted, which results in certain statistical abnormalities that can be discovered via suitable analyses (see Figures 3.8 and 3.9).

Figure 3.8 Frame-Removal Artifacts. These plots depict magnitudes of DFT (Discrete Fourier Transform) of prediction error sequences for (a) an original video and (b) its frame-deleted version (which is recompressed after the forgery). The characteristic peaks in (b) (indicated by the arrows) are the artifacts of the forgery. [Images courtesy of Stamm et al. 2012]

Figure 3.9 Detection of Frame-Replication. These plots illustrate prediction residual patterns for an original video and its tampered version, containing a series of replicated frames. The replication is evident from the repeated patterns in the plots (the repeated patterns are marked with bounding boxes). [Images Courtesy of Singh and Aggarwal 2017a]

Prediction residual abnormalities have been used as forensic features in Stamm et al. 2012, Shanableh 2013, Kancherla and Mukkamal 2012, Liu et al. 2014, Kang et al. 2015, Aghamaleki and Behrad 2016a, and Aghamaleki and Behrad 2016b, Yu et al. 2016, Mathai et al. 2016, and Singh and Aggarwal 2017a.

Optical Flow and Brightness Variance: Optical flow, which refers to the pattern of apparent motion of objects, surfaces, and edges among successive video frames, is yet another utilitarian forensic feature that enables detection of inter–frame forgeries. In an authentic video, optical flow variations between successive frames remain more or less consistent, but in case the pre–established frame sequence is disturbed, optical flow starts exhibiting certain irregularities, which once detected, can serve as the fingerprint of tampering.

Figure 3.10 presents an example that illustrates the effect of frame–insertion on optical flow; to better highlight the disruption caused by frame tampering, FFT3 (Fast Fourier Transform) coefficients of optical flow are computed and their variance is plotted for an original video and its tampered version.

Figure 3.10 Optical Flow Inconsistencies. Plots of Vefft values (i.e., variance of FFT coefficients of optical flow) for successive frames of an original video and its frame-deleted version. [Images Courtesy of Singh and Aggarwal 2017a]

The notable variations in the optical flow pattern of frame–tampered video in contrast to that of the original video are due to the abrupt discontinuities caused in the optical flow sequences when a set of frames are removed from the video.

Another way to visualize the effect of frame–tampering on optical flow consistency is to plot the magnitudes of FFT coefficients of optical flow for every pair of adjacent frames of the tampered video. This way, we can localize the exact point from where the frames were removed (see Figure 3.11).

* Fingerprints of most forgeries become evident upon analysis of forensic features in the frequency domain. This is why a majority of the researchers choose to conduct their analyses in the frequency domain, by using transforms such as FFT, DCT, and DFT.

Figure 3.11 Optical Flow Artifacts. These figures depict mesh plots of magnitudes of efft values (i.e., FFT coefficients of optical flow) for successive frames of a tampered video. As we can observe from these plots, while optical flow remain consistent for pairs of adjacent frames (i.e., frame pair 5-6, 33-34, and 88-89), frame pair 135-136 generates a strikingly dissimilar pattern, which is a consequence of the sudden discontinuity in optical flow that occurs whenever the pre-established frame sequence gets disrupted. This indicates that frames 135 and 136 are not truly adjacent but are in fact pseudo-adjacent, i.e., they became adjacent when some intermediary frames were removed and the rest of the frames were combined to create a new video. [Images Courtesy of Singh and Aggarwal 2017a]

Optical flow has been used as a forensic feature in Chao et al. 2013, Zheng et al. 2014, Kingra et al. 2017, and Singh and Aggarwal 2017a.

Velocity Field Consistency: Velocity field refers to the displacement between neighboring video frames caused by time separation. While velocity field follows a consistent pattern in an authentic video, the authors in Wu et al. 2014 demonstrated that an inter–frame tampering such as frame–removal or frame–replication causes the velocity field sequences to exhibit discernible irregularities, thereby enabling detection of said tampering (Figure 3.12).

Figure 3.12 Velocity Field Inconsistencies. These figures depict horizontal and vertical velocity field intensities (respectively) for an (a, b) un-tampered video, (c, d) tampered version of the video suffering from frame-removal, and (e, f) tampered version of the video suffering from frame-replication. [Images Courtesy of Wu et al. 2014]

The characteristics spikes in the intensity fields of tampered videos are considered to be the artifacts of the forgery.

3.3.1.4 Pixel–Level Analysis Based Techniques

Pixel–correlation analysis is among the most instinctive methods for the detection of frame–duplication forgeries. Frame–duplication forgeries basically involve replication of a set of frames at another temporal location within the same video, which implies that the final tampered video essentially contains multiple instances of the same set of frames. Due to the presence of identical frames, pixel–correlations among consecutive frames of such a video become abnormally high. Note that a certain amount of pixel–correlation is always present among successive frames of a natural video (because in order to create the illusion of continuous motion, adjacent frames of a video have to be somewhat alike), this correlation is not nearly as high as that present in a frame–replicated video. This methodology has been used in Wang and Farid 2007a and 2007b, Lin et al. 2011, Lin and Chang 2012, Bestagini et al. 2013a, Li and Huang 2013, Yang et al. 2016, Liu and Huang 2017, Ulutaş et al. 2017a and 2017b, Huang et al. 2017, and Wei et al. 2017.

3.3.2 Detection of Temporal Splicing

Temporal splicing is the process of merging the frames of two or more separate videos to create a new fraudulent video. If the source videos have different frame rates, they must be temporally interpolated before they can be spliced together. This is done with the help of an operation called Frame Rate Up–Conversion (FRUC) or frame interpolation, which is a process whereby new frames are created with the help of existing ones and are inserted into the given video, thus increasing its frame rate. Techniques that detect temporal splicing basically try to detect FRUC.

The authors in Bestagini et al. 2013b demonstrated that frame interpolation, when performed with the help of motion compensation, introduces certain discernible artifacts, which can be used as the evidence of presence of FRUC (see Figure 3.13.

In this figure, ‘|F(PE)|’ denotes the ‘Magnitudes of Fourier Coefficients of Prediction Error’ for the entire video sequence.

Figure 3.13 FRUC Artifacts. This figure depicts the patterns of prediction error sequences for an authentic video (denoted by the plot in blue) and an interpolated version of the video (denoted by the plot in red). Note the periodic artifacts in the prediction error sequence of the interpolated video; these are the artifacts that help separate authentic videos from interpolated ones. [Image Courtesy of Bestagini et al. 2013b]

FRUC also introduces periodic artifacts in the inter–frame similarity sequences of up–converted videos, which can be used for the detection and identification of interpolated frames, as suggested in Bian et al. 2014a and Bian et al. 2014b.

When a new frame is generated by merging two different frames, a certain amount of blurring occurs, especially in those pixels that lie at the boundaries of the objects present in the frames. This suggests that the intensity of edges of a particular object is lower in the interpolated frame as compared to the intensity of edges of the same object in the original frames (i.e., the frames that were used to generate the interpolated frame). Furthermore, since the interpolated frames are interleaved periodically into the original video, reduction in intensities of object edges in an up–converted video exhibits certain periodicity along its temporal dimension. The presence of this periodicity can serve as the evidence of presence of interpolated frames, as demonstrated in Yao et al. 2016 (Figure 3.14).

Figure 3.14 Edge Intensity Artifacts. This figure shows a comparison of object edge intensities for an original sequence (the plot depicted in purple) and its up-converted version (the plot depicted in green). Note the periodic drop in edge intensities in the up-converted video; these drops correspond to the interpolated frames in the tampered video. [Image Courtesy of Yao et al. 2016]

FRUC has also been observed to introduce discernible periodic artifacts in the texture regions of the affected (interpolated) frames. The scheme suggested in Xia et al. 2016 relies on the detection of such artifacts (Figure 3.15).

Figure 3.15 Texture Artifacts. This figure compares ATV (Average Texture Variation) curves for original video and its up-converted versions created using two FRUC methods: FA (Frame Averaging) and MCI (Motion Compensated Interpolation). It is evident from the plot that while the average texture variations for the frames of an original video are quite consistent, the average texture variations for up-converted videos demonstrate prominent periodic peaks of large magnitudes. These periodic artifacts indicate that as compared to videos that do not suffer from interpolation, up-converted videos exhibit a lot of texture variations among successive frames. [Image Courtesy of Xia et al. 2016]