First I'd like to define roughly what video segmentation is then talk about it in more technical details.
The term video segmentation is used with two applications in mind, the first making a long video short by extracting the key shots and images and puting them in a sequence summarizing the long video. So you can think of it as making movies trailers automatically.
The other is extracting the objects that were shot in a video and tracking them through a series of shots; in short making videos of the objects in one big video. So you can think of it as Arnold Schwarzenegger moving as the terminator identifying objects with his "eyes" and getting data about objects he's seeing.
Both problems are challenging and interesting but I'll focus more on the challenges faced when developing algorithms for the second definition.
If you used Photoshop ever then you must have done some image segmentation before. If you still don't know what I am talking about, remember the magic tool. Using that tool you can select objects in a certain image almost automatically. A lot of technquies have been developed for that problem and they are quite useful now. But lets think how complicated it can get with it turns into a moving objects.
A simple answer to the problem would be, video is just a series of images so we can just keep selecting the object in all of those frames.
Simply doing that will raise a question, how can you know that the object you're selecting now is the same object you selected in the previous frame?
Objects tend to move, backgrounds do the same thing and both tend to have at some points parts that make them look as if they were the same objects.
Also objects tend to intersect and hide behind one another making it look as if objects are disappearing.
Another important behaviour is that most objects change the way they look (think of a rubber ball).
With all of those consideration in mind, developing an algorithm that does object identification, object tracking and event detection becomes a very interesting and difficult task.
To wrap it up, I'll give a simple example which I guess must be using video or at least image segmentation. Digital Cameras, cool new digital cameras can detect faces and track them while them camera is moving. Just add some face recognition system and you got yourself a first component of terminator's system :D