Spherical video is tricky. The process of stitching together multiple images to create a equirectangular projection image doesn’t give you an exact image resolution. So what resolution should I be outputing my video to? Too low a resolution and you’re loosing detail. Too high a rendered resolution and you’re wasting bandwidth as no more details are created when an images is scaled up.
In using the new Vuze Camera from HumanEyes, I wanted to figure out what the ideal rendered output should be to maximize image quality and minimize file size waste. It’s easy to output to 4096 x 4096 px and be done with it, but that resolution of is not easily playable by most devices today, and it may be a waste of bandwidth.
So the goal is to get a ballpark idea of what the final rendered resolution should be based on the data recorded by each sensor, in order to retain as much detail as possible from the raw footage to the youtube file as possible, in order so the viewing experience is as sharp as possible when viewed in 3D 360º, without wasting extra file size.
The resolution I came up with is 3200 x 2880 pixels. Read on to find out how I came to that conclusion.
The video footage that comes off the camera is stored in mp4 files. The mp4 files have 4 streams in them. Two of the streams are video and two streams are audio. The two stereo audio streams make up the 4 MEMS microphones that re in the camera.
The two video streams make up the raw video captured by the SONY image sensors.
Each of the two video streams capture 4 cameras views. The views are the spherical view that the actual sensor sees.
Pretty elegant method to encode 8 camera views. I think the sensors are 1920x1080p capture devices.
As you can see from the image below, the image or rather the sensor is rotates 90º, so that wide part of the sensor (the 16 part of 16×9) captures the image top to bottom which is 180º worth of data. The width of the sensor (the 9 part of 16×9) captures the lateral direction of the image. This makes sense since laterally each camera only needs to capture 90º, which means there’s some overlap in the images which helps with the stitching.
The mp4 video stream resolution is 3200 x 2178. This means that each sphere of video information from each camera has a resolution of 1600 x 1089, or rather 1089 x 1600 since the 1600 pixels capture 180º of info from top to bottom.
This means that the camera is either not using the full 1920 pixels available (cropping to 1600px) or the 1920 resolution is scaled to fit into 1600px. I think the former is more likely. And looking at the spherical images, it looks like even the full 1600 pixels are not being fully used since there’s a black border around the top and bottom of the captured image.
So the used image is likely 1400-1500px covering 180º. This means that laterally (since 1500opx = 180º), the lateral resolution to cover 90º is about 700-800px. When four 90º segments are joined, the resulting resolution (that covers 360º around the camera) will be around 2800-3200px.
This tells me that there is no point in rendering the video to a 4096 width resolution since the max pixels available is 3200px across.
The 4k preset in the Humaneyes VR Studio softwaere (the software that stitches the video) is 3840 x 2160px. That format stores two video streams (Left and Right eye) in one file so each eye gets a resolution of 3840 x 1080px.
However a lot of information is lost since the 1500px worth of info is crammed in 1080pixels.
A more appropriate video resolution for a final output should cover 3200 x 1500 px per eye, but I don’t think 3200px wide is a standard resolution so 3840 is the next closest resolution. Similarly, I don’t think 1500px is a standard resolution but I have seen 1440px listed as an option in Youtube, and 2880px is considered 5k by Youtube, so I’ll use that insead
The resolution to try is 3200 x 2880px. This is very close to the native resolution that can be generated by the image sensors on board the camera, which means you’re not wasting any file size for no extra detail. Youtube accepts this resolution (2880 is considered 5K)