3D reconstruction project

Required functionalities

Each group should implement a system with the following functionality:

Gold standard F-matrix estimation from known correspondences. (Many of the required functions exist in LAB3, and in OpenCV).
Computation of an E-matrix from F and known K, and subsequent extraction of R and t from E, and visible-points constraint.
Robust Perspective-n-point (PnP) estimation for adding a new view and detecting outliers.
Bundle adjustment for N cameras (N≥2) using known correspondences, intrinsic camera parameters (K-matrix), and an initial guess for the camera poses.
Use a sparsity mask for the Jacobian in the Bundle adjustment step (this speedup is absolutely essential if you use Matlab's LSQNONLIN, or scipy.optimize.least_squares). Alternatively, in C++ you can use Ceres Solver, which handles the sparsity in a different way.
Evaluate the robustness to noise in the implemented system, by measuring the camera pose errors of your reconstruction. Details can be found here.

In addition to the above, groups with four students should implement at least one of the following functionalities:

Visualisation of the 3D model. Pick one of the following:
- Use Poisson surface reconstruction (or screened PSR) to obtain a volumetric representation. e.g. in Meshlab, or Kazhdan's own implementation. This requires you to estimate colour and surface normals for the 3D-points. Use visibility in cameras to determine normal signs. See Kazhdan and Hoppe for a theoretical description.
- Use space carving (as in Fitzgibbon, Cross and Zisserman) to generate a volumetric representation of your object.
- Use texture patches (e.g. PMVS) or billboards, to represent the texture of the estimated 3D model.
Find correspondences between pairs of views, and remove false matches using epipolar geometry and cross-checking. (Allows the implemented system to use any image set with known K)
Use your own camera to take images of an object, and reconstruct it. This requires estimation of the K-matrix, and possibly lens distortion, followed by image rectification. Note that this also requires finding your own correspondences (functionality #2), and that you need to carefully plan how to take the images.
Densify your initial sparse SfM model, using PatchMatch between selected stereo pairs, or using one of the approaches in the Multiview Stereo tutorial.
Use next-best-view selection (e.g. as in Schönberger et al., or some simplified form thereof) to select the order in which cameras are added to SfM.
Replace Incremental SfM with Global SfM. Use global estimation of all rotation matrices as an initialisation of Bundle Adjustment (as in Martinec and Pajdla).
Write your own non-linear solver, e.g.~Levenberg-Marquardt (see the report by Madsen et al.), and use the Shur-Complement Trick to speed up the computation of the update step (see the IREG compendium).

Groups with five students are required to implement two of the above, and groups with six members should implement three.

Datasets

The Visual Geometry Group at Oxford University has a number of datasets availabe on their web site, e.g., the dinosaur sequence:

Multiple-views datasets from the Visual Geometry Group at Oxford University

These data sets contain coordinates of image points that are tracked between images, which means that there is both some amount of noise on the image coordinates and that these points are not visible over the entire sequence since they are occluded in some or even most of the images. The data sets contain some type of ground truth in terms of estimated camera matrices for each view.

Notice that the dinosaur sequence is a turn-table sequence that is generated by rotating the object around a fixed axis. The acquisition geometry of this dataset makes it sensitive to the quality of the used point correspondences.

For debugging it is useful to work with a noise free dataset, and for this purpose we have "cleaned" the dinosaur data set from any noise (up to the numerical accuracy of Matlab) and produced a dataset that contains 2D image points, 3D points, and the camera matrices. These are found in the BAdino2.mat file on the local ISY file system:

/courses/TSBB15/sequences/BAdino/BAdino2.mat

The EPFL multi-view stereo datasets are also highly recommended:

Multi-view stereo datasets

They all have undistorted high-res images, with ground-truth camera poses and known intrinsic camera parameters. However, you have to find the correspondences yourself here.

Python code

Please look at the utility code for CE3. You can find it at /courses/TSBB15/python/ in the computer labs. Also look at the Extra exercises in the lab sheet, they are meant to help you get started with the projects.

Matlab code

In the case that you use Matlab for solving the tasks in this project there are some software packages, that will spare you from implementing all functionalities yourself:

The Visual Geometry Group at Oxford University
Peter Kovesi at the University of Western Australia
The LSQNONLIN optimiser in Matlab. Note that LSQNONLIN is very slow, unless a sparsity mask is used.
P3P by Laurent Kneip. Note: Documentation is weird. You need to verify the input and output behaviour yourself on synthetic data (e.g. for coordinate axes). See this example for how to do this in Matlab.

C code

You are absolutely NOT allowed to use complete SfM systems such as SBA, Bundler, Visual SFM, SSBA, and OpenMVG.

On the other hand, you are encouraged to use a non-linear least squares package, and if you intend to make a more advanced representation of your 3D model, you may use a package for this. Some useful (and allowed) packages are listed below.

Ceres Solver, a library for efficient non-linear optimization.
levmar, and lmfit, Levenberg-Marquardt optimisation packages.
sparseLM, a sparse Levenberg-Marquardt optimisation package (much faster, but also more complex to set up).
PMVS, package to compute 3D models from images and camera poses.
OpenCV, implements several local invariant features, and some geometry functions.
Lambda twist P3P by Mikael Persson and Klas Nordberg.
P3P by Laurent Kneip.
OpenGV, a collection of geometric solvers for calibrated epipolar geometry.

Deliverables

In order to pass, each project group should do the following:

Make a design plan, and get it approved by the guide.
The design plan should contain the following:
- A list of the tasks/functionalities that will be implemented
- A group member list, with responsibilities for each person (e.g. what task)
- A flow-chart of the system components
- A brief description of the components
Deliver a good presentation at the seminar
Hand in a written report to the project guide.

The report should contain the following:

A group member list, with responsibilities for each person
A description of the problem that is solved
How the problem is solved
What the result is (i.e. performance relative to ground truth)
Why the result is what it is
References to used methods

We recommend that you use the CVPR LaTeX template when writing the report.

References

C. Barnes et al. PatchMatch: A randomized correspondence algorithm for structural image editing, ACM Transactions on Graphics (Proceedings of SIGGRAPH) 2009.
A. Fitzgibbon, G. Cross and A. Zisserman, Automatic 3D Model Construction for Turn-Table Sequences , in 3D Structure from Multiple Images of Large-Scale Environments , Editors Koch & Van Gool, Springer Verlag 1998, pages 155-170.
Y. Furukawa and C. Hernández, Multiview Stereo: A Tutorial. In Foundations and trends in Computer Graphics and Vision Vol. 9, No. 1-2, 2013.
Michael Kazhdan and Hugues Hoppe, Screened Poisson Surface Reconstruction, ACM Transactions on Graphics 2013.
K. Madsen et al., Methods for Non-Linear Least Squares Problems, 2nd ed., DTU technical report 2004.
Daniel Martinec and Tomáš Pajdla, Robust Rotation and Translation Estimation in Multiview Reconstruction, CVPR 2007.
Klas Nordberg, Introduction to Representations and Estimation in Geometry (IREG).
Johannes Schönberger and Jan-Michael Frahm, Structure-from-Motion Revisited, CVPR 2016.
Triggs, McLauchlan, Hartley and Fitzgibbon, Bundle Adjustment - A Modern Synthesis, in Vision Algorithms - Theory and Practice, Springer Verlag, Lecture Notes in Computer Science No 1883, 2000, pages 298-372.

Senast uppdaterad: 2021-01-04

Datorseende (CVL)