# Camera position in world coordinate from cv::solvePnP

I have a calibrated camera (intrinsic matrix and distortion coefficients) and I want to know the camera position knowing some 3d points and their corresponding points in the image (2d points).

I know that `cv::solvePnP` could help me, and after reading this and this I understand that I the outputs of solvePnP `rvec` and `tvec` are the rotation and translation of the object in camera coordinate system.

So I need to find out the camera rotation/translation in the world coordinate system.

From the links above it seems that the code is straightforward, in python:

``````found,rvec,tvec = cv2.solvePnP(object_3d_points, object_2d_points, camera_matrix, dist_coefs)
rotM = cv2.Rodrigues(rvec)
cameraPosition = -np.matrix(rotM).T * np.matrix(tvec)
``````

I don’t know python/numpy stuffs (I’m using C++) but this does not make a lot of sense to me:

• rvec, tvec output from solvePnP are 3×1 matrix, 3 element vectors
• cv2.Rodrigues(rvec) is a 3×3 matrix
• cv2.Rodrigues(rvec) is a 3×1 matrix, 3 element vectors
• cameraPosition is a 3×1 * 1×3 matrix multiplication that is a.. 3×3 matrix. how can I use this in opengl with simple `glTranslatef` and `glRotate` calls?

If with “world coordinates” you mean “object coordinates”, you have to get the inverse transformation of the result given by the pnp algorithm.

There is a trick to invert transformation matrices that allows you to save the inversion operation, which is usually expensive, and that explains the code in Python. Given a transformation `[R|t]`, we have that `inv([R|t]) = [R'|-R'*t]`, where `R'` is the transpose of `R`. So, you can code (not tested):

``````cv::Mat rvec, tvec;
solvePnP(..., rvec, tvec, ...);
// rvec is 3x1, tvec is 3x1

cv::Mat R;
cv::Rodrigues(rvec, R); // R is 3x3

R = R.t();  // rotation of inverse
tvec = -R * tvec; // translation of inverse

cv::Mat T = cv::Mat::eye(4, 4, R.type()); // T is 4x4
T( cv::Range(0,3), cv::Range(0,3) ) = R * 1; // copies R into T
T( cv::Range(0,3), cv::Range(3,4) ) = tvec * 1; // copies tvec into T

// T is a 4x4 matrix with the pose of the camera in the object frame
``````

Update: Later, to use `T` with OpenGL you have to keep in mind that the axes of the camera frame differ between OpenCV and OpenGL.

OpenCV uses the reference usually used in computer vision: X points to the right, Y down, Z to the front (as in this image). The frame of the camera in OpenGL is: X points to the right, Y up, Z to the back (as in the left hand side of this image). So, you need to apply a rotation around X axis of 180 degrees. The formula of this rotation matrix is in wikipedia.

``````// T is your 4x4 matrix in the OpenCV frame
cv::Mat RotX = ...; // 4x4 matrix with a 180 deg rotation around X
cv::Mat Tgl = T * RotX; // OpenGL camera in the object frame
``````

These transformations are always confusing and I may be wrong at some step, so take this with a grain of salt.

Finally, take into account that matrices in OpenCV are stored in row-major order in memory, and OpenGL ones, in column-major order.