Information


This project focuses on detecting image tampering using shadow-based feature analysis. It targets urban infrastructure imagery, where ground-level images are commonly used to monitor development progress and inform planning decisions.

The system uses pixel-based algorithms to extract features related to texture, lighting, and depth, and it evaluates them using a dual-path approach: a rule-based method that applies threshold-driven logic and a machine learning pipeline that learns patterns from the same engineered features.

A web application allows users to upload images and receive a tampering likelihood score with supporting analysis details. By combining explainable heuristics with data-driven models, we aim to improve the reliability and authenticity of visual data used in urban planning and reporting.



FAQ

Urban infrastructure related images with visible shadows and clear lighting conditions work best.

Currently only JPEG and JPG files are accepted.

Analysis may take up to a minute. The system is running three separate feature extraction pipelines (texture, lighting, and depth) simultaneously, so processing time can vary with image size and complexity.

N/A appears when a module's score fell in the uncertainty zone (between 0.45 and 0.65) or when an error occurred during that module's analysis. It means that the module did not cast a vote, not that the image is safe.

No, images are processed in memory and discarded after analysis. No images are saved.

Inconclusive means the system could not reach a consensus verdict. It happens in two situations: the modules that cast votes split evenly between Real and Tampered (a tie), or every module returned a borderline score and no votes were cast at all.
This is different from N/A, which is a per-module state saying one specific module did not produce a result. Inconclusive is the final consensus when the overall picture is too ambiguous to call.

Yes. Very large images may exceed the server's upload size limit and return an error. If this happens, you will see a message indicating the server did not return a valid response. Try reducing the file size by resizing or re-exporting the image at a lower resolution before uploading.

The system extracts shadow-related features from the image and evaluates them using both rule-based scoring and a machine learning pipeline.

Each module (texture, lighting, and depth) outputs a score between 0 and 1. Scores at or above 0.65 are flagged as likely tampered, scores at or below 0.45 are considered likely real, and anything in between falls into an uncertainty zone where no vote is cast.

The rule-based method applies hand-crafted thresholds to shadow features and uses a majority vote. The ML stacked model feeds probabilities from three learned module models into a final classifier. Seeing both side-by-side lets you compare how each approach interprets the same image and evidence.

Disagreement between rule-based and ML results is normal and actually informative. It signals an ambiguous or borderline case. Neither result should be treated as a final verdict on its own.

The evidence section visualizes what the system examined and can be further inspected by the user. For texture: detected shadow regions and LBP texture patterns. For lighting: shadow components colored by light source and a brightness ratio heatmap. For depth: boundary sampling points for penumbra measurement and elongation bounding boxes. Only modules flagged as suspicious will show evidence images.

The system is specifically designed to detect shadow inconsistencies that suggest content has been added, removed, or composited into an image. For example, a pasted object whose shadow does not match the scene's lighting direction or softness.

Initially, tie was treated as Real because this tool is designed as supporting evidence, not a final verdict. A tied vote now returns Inconclusive rather than defaulting to Real because a tie between modules means evidence is genuinely split. Forcing a Real verdict would hide that ambuguity, but that makes it less honest. Inconclusive is the better answer: the system does not have enough agreement to make a call, and a human reviewer should take a closer look.

When the result is Inconclusive, the system shows the mean rule-based score across all modules that produced a valid number, along with a "leans suspicious" or "leans real" note. This is not a verdict — it is a soft indicator of which direction the evidence tilts when the modules cannot agree. It is most useful as a prompt for where to focus a manual review, not as a confidence score.

No, these results are an estimate based on the detected shadow evidence and should only be used as a supporting analysis rather than definitive proof.

The entire pipeline depends on detecting shadow regions first. If no shadows are found, all three modules fall back to zero or N/A scores. A tampered region that casts no shadow, or an image without clear shadows, cannot be meaningfully analyzed.

No. The system is specifically designed to catch shadow inconsistencies from compositing or cloning edits. It will not detect tampering that does not disturb shadows, such as brightness adjustments, color grading, object removal from shadow-free areas, metadata changes, or AI-generated content.

Yes, sophisticated compositing that carefully matches shadow texture, lighting ratios, and penumbra characteristics may go undetected. The system should not be used as a definitive detector.

All three modules are built on top of the shadow masking step. If the mask misclassifies a region, treating a dark road surface as a shadow, or missing a faint shadow, the downstream analysis works from incorrect input. This is the single most impactful point of failure in the pipeline.

The lighting consistency check needs at least two separate, usable shadow regions to compare against each other. Images with only one shadow or where only one shadow passes the quality filters cannot produce a lighting vote.

The system is a research prototype trained on a limited urban infrastructure dataset. Accuracy is not guaranteed on all image types, and results should be treated as supporting analysis rather than definitive proof.

Handcrafted features (LBP texture, brightness log-ratios, penumbra widths, elongation ratios) are grounded in the physics of how light and shadows work. This makes the system interpretable: you can point to a specific shadow boundary and explain why a score was triggered. A deep learning model learns features that are harder to explain, which matters when the output is intended to support human decision-making rather than replace it.

The two methods make different kinds of errors. The rule-based method applies fixed thresholds that are easy to audit but may miss subtle patterns. The ML model can generalize from training data but is less transparent. Showing both lets users see where the methods agree (increasing confidence) and where they disagree, signaling ambiguity that should be further inspected.

The stacked model was trained on a limited set of urban infrastructure images built for this project. Small training sets mean the model may not generalize well to unusual lighting conditions, different camera types, or tampering styles it has not seen. Results should be treated as supporting evidence accordingly.

The rule-based method is like us saying "we think a score above 0.65 means something is off, and here's why." Every threshold and weight was hand-picked based on our understanding of how shadows should behave. Ideally, it reasons the way a person would if they were manually checking an image.
The ML model takes a step back from that. Instead of us deciding what the cutoffs should be, the model figures out on its own what patterns in the features tend to show up in tampered images versus real ones. So it is less "here's what we think matters" and more "here's what the data actually showed."
The features we chose to extract still reflect our understanding of what matters in a shadow, and the model learns from the data it was trained on, so there are still human decisions factored. However, the actual pattern-recognition and decision-making is left to the model. It can pick up on things we might not have thought to look for.

Replacing the pixel-based shadow masking step with a computer vision-led shadow segmentation model would likely have the largest effect, since every module depends on mask quality. After that, training on a substantially larger and more diverse dataset of urban imagery would improve generalization significantly.

Yes, the current system only looks at shadows, but future work could check for other signs of tampering.
For example, a copied region might have slightly different color tones from the rest of the image (color temperature inconsistencies), or it might have a different grain or noise texture because it came from a different camera (sensor noise pattern analysis). JPEG compression artifacts can also be analyzed.
These additional checks would work alongside the shadow analysis rather than replacing it, though expanding in this direction would shift the focus of the project.

Right now the system gives an overall score and some evidence images, but it does not highlight exactly which part of the image looks suspicious. A future version could draw a heatmap directly on the image to show the user where to look, making the results easier to interpret and act on.

With the training dataset available for this project, a complex model would overfit. The stacked architecture (three module models each producing one probability, feeding a simple final classifier) is deliberately conservative. A richer dataset would justify a richer model, but scaling the model without scaling the data first would hurt, not help, generalization.