Diegetic Frames Per Second
10.18.2015 § Leave a comment
In this post I will describe three different types of frame rates: video, head tracking, and diegetic. The last of these three types I do not believe has been given a name before. Finally, I will express my excitement over what it will be like to experience all three types together in a live action virtual reality way.
Aesthetics of Video Frame Rate
In 2012, The Hobbit: An Unexpected Journey became the first feature film with a “high frame rate” to enjoy wide release. Historically up to that time (and still to today) traditional film was viewed at a standard 24 frames per second, but some screenings of The Hobbit boasted a frame rate double that, at 48. To understand why the team behind The Hobbit movie offered this option, we first have to understand why 24 was standard in the first place.
Persistence of vision is the name of the illusion which is fundamental to moving pictures. It refers to the effect whereby the presentation of a succession of related still images can appear to come to life, to animate, to resemble real continuous motion. If one were to view a succession of such images at a rate of only one per second, that is, at 1 fps, the succession would be still too slow to produce the illusion; the mind would register them as individual images. It’s not until one increases the frame rate to 10 fps that the threshold of the illusion is met.
10 fps is fast enough to elicit persistence of vision, sure, but just barely. The motion still looks a tad jerky at that level. As one continues to increase the frame rate, more convincing illusions can be achieved. Generally, the higher the frame rate goes, the more realistic motion appears. This is because the simulation of motion is approaching the limit of our ocular system’s ability to distinguish it from real motion. For most people, this power of discernment maxes out around 100 fps, so recording or replaying faster than 100 is wasted energy.
24, then, is a point on the continuum from 10 to 100 that strikes a lovely balance between realism and magic. It’s not so slow that we get distracted noticing the artifacts of the machinery that connects us with the story world. But neither are we distracted by an insistence on the reality of the real world which the story world was built out of. 24 is a skirting of both apparatus and profilmic, for those theory nerds out there. It makes the magic realistic, while keeping the real magical.
The Hobbit team decided to experiment with this balance. Some audience members liked it, feeling they’d brought Middle Earth even further to life. More audience members, it seemed (or perhaps they were just the more vocal group), disliked it. Perhaps it just screwed with their expectations about what a movie should look like. Perhaps they consciously recognized that it made them think more of the (however talented!) actors in their (however convincing!) makeup and costume being on (however amazingly produced!) sets than it did about the substance of the film experience itself.
In any case, this example clearly illustrates the aesthetic effects of different video frame rate choices.
The Race for Head Tracking FPS
Meanwhile, in virtual reality, frame rate was taking its own journey.
To enter a virtual environment one must wear a head mounted display, or HMD. This display, like a traditional movie theater screen, also uses persistence of vision to simulate motion, and the same 10-100 fps constraints apply to that illusion here. However, there is a second consideration which exists for HMDs – head tracking – and its range of appropriate frame rates is completely different.
HMDs are designed to keep track of the position and orientation of your head. This is so they can update the display to render the virtual world from the proper vantage. This repeated updating needs to occur fast enough that you do not notice it. Unlike the fps of a film, slower fps’s do not result in aesthetically pleasing effects. In fact, when the fps of head tracking is too slow, it generally results in sickness for the user of the HMD. The human mind is simply not used to experiencing a delay between inputs from the senses. If we were to set an HMD’s head tracking fps to 24 fps, then when the muscles in the user’s neck twisted 1/24th of a second ago but her eyes were still seeing from the same angle, and this effect was happening to her 24 times a second, things would get a little vomit-inducing.
I doubt anyone ever thought 24 fps was a great setting for an HMD. I can’t imagine there was ever much contention about the ideal head tracking fps being “as fast as possible; anything as long as it is beyond the threshold of human perception”. In reality, though, we didn’t have the technology to be able to head track and render at 100 fps at first. This is one of the key reasons virtual reality was impractical for the masses for so long. When consumer headsets began to appear, head tracking fps offerings were still only around 60 or 75. It is safe to say that in the near future, however, head tracking fps will be a non-issue, and it will be simply assumed that any HMD is sufficient in that respect.
Head tracking fps exists even for completely motionless virtual environments. Whether these environments are computer generated or captured from real life, if the content of the display is still, your motion as the viewer being tracked still creates differences in the display’s contents from render frame to render frame.
Virtual environments are often in motion, though, and when they are, the potential exists for multiple different frame rates: one for the head tracking, and one for the environment. For simplicity’s sake we will think of changes in the environment as scripted, so we can consider this frame rate analogous to video. Let’s consider an immersive cinematic experience, shot with a 360 degree camera rig.
Maybe this camera used the same fps as the HMD, that is, it was shot at 100 fps. This is well beyond what is used for normal film and video (30), and well beyond even what they used on The Hobbit. This might have been done in order to make what it recorded feel as real as possible. The feeling of presence in a virtual environment, or the loss of awareness of mediation, is, after all, a common goal with virtual content creators.
But maybe the camera did shoot at a traditional speed like 24. Why not? Lower fps on the video should not induce sickness. Suppose the HMD is at 96 while watching video at 24, and you’re moving your head to look at different sections of the action all around you: you’ll be seeing four slightly different croppings for each frame of video. The important aspect of the fundamental phenomenological continuity is still there; your brain is just tricked into feeling surrounded by objects’ whose motions occur in a stylistically lower fps.
Immersion in Film
Whether shot at 100 or 24, most immersive cinema isn’t quite immersive yet. In fact, many experts do not even consider it true virtual reality. The reason for this distinction is that is not yet possible to simply switch on a 360 camera and record three-dimensional action over time in a real life environment. It is possible to simulate the effect using a number of different tools and a ton of time with computers, but not practical. For the next couple years, at least, most immersive video is more akin to watching a movie displayed on a spherical shell all around you, but the actors are all still basically flat, glued to this virtual surface.
Some will at least use some more advanced technology to include stereoscopic effects, localized sweet spots where the images displayed to each eye are different, simulating depth. But parallax – otherwise known as the ability to see around objects and have them move at different rates past each other depending on their distance from you when you bob your head side to side – is not possible yet with photorealistic moving experiences, unless they are cobbled together from separate still captures of environments and studio captured performances of actors with huge numbers of cameras looking in on them one at a time rather than out at everything at once.
And of course there is straight CG animation, where 3d immersion presents no technical obstacles at all. It would be no problem at all to generate an animation to be experienced in 100 head tracking fps but that would play at 24 fps, so that one could know what it was like to feel perfectly present in a world which itself operates at a stylistically lower frame rate, with 3d frames of existence.
But I lament the lack of real life ness here. For that we have stop motion. Coming out of USC-ICT: 3d captured stop-motion animation with a stylistically low frame rate. Real life objects you can look around at from any angle like they were real, but seeming to move with otherworldly quantization. However, the motion has that distinct signature of stop motion – it is not true smooth motion as things with muscles or gears would have made.
So for me, I am super excited about the near future when photorealistic immersive recordings are commonplace, where I can become been immersed not inside merely a recording of another world, but immersed inside a film – with all the magic of its irreality. Transposing the magical quality that low frame rate lends to film into the third dimension. Perhaps as accoutrements to this we could have film grain but applied atmospherically in space, or spherical cigarette burns before cuts, other such artifacts.
Experiencing a realistic world in another fps would be enough, but a realistic world with multiple fps’s would be even more amazing. Here’s where we finally get to the new term: diegetic fps.
Watching anime, one may from time to time notice that different stretches of animation seem to play at different resolutions. Objects in the foreground may move 12 times a second, while those in the background only 8 or 6 even. Thus, two types of fps exist just within this traditional 2d medium: the video fps, and that of individual objects inside the world it portrays. Though clearly the reasons for the choice of frame rate are budgetary first and only secondarily intentional (in terms of their expressing the story world), I would suggest calling the latter type of fps “diegetic”.
For a three-dimensional example, consider the playable character Mr. Watch and Game of Super Smash Bros. Unlike the other characters he is modeled after more primitive video game technology, so he animates at a stylistically lower frame rate (below persistence of motion). He can still overall move at a normal rate, and he doesn’t lag in updating when the camera of the screen’s action moves – it is just him himself who marches to the beat of a slower drummer.
To me, the biggest trip would be experiencing three different frame rates simultaneously, in virtual reality:
- Maxed-out head tracking fps,
- 24 fps video,
- of co-existing actors performing in various fps’s up to 24 (anything faster would be pointless, hard cut off at 24)
The objects are real, the motion is real, presence is real, but the quantization of the motion is stylistically slow and variable. This is the dream.