Tuesday, September 29, 2009

Illinois Web Accessibility Conference and Expo: Session 2 –Video Captioning

Speaker: Colleen Cook, ATLAS, UIUC

Their first step is trying to make the task of transcription easier for users. They also are trying to require that users provide a transcript through policy.

They are also trying to use some automation to automatically generate a synchronized transcript, matching speaking pauses to comma, periods, and paragraph changes.

Finally, they are focusing on marketing- educating instructors about this service being available, and what they need to do to make their content more available. They are also advertising how this helps sighted users with search- a transcript/synchronized caption makes text based searching of audio/video presentations far more user friendly.

They also encourage people to keep their audio/video to 5-10 minutes. Most web users just won't sit through a video that is longer than that. Plus, it makes it easier for them to caption and synch it. In fact, they don't charge for captioning audio/video under 10 minutes.

They are currently using Inkscribe for transcription, and Mac caption for synching the captions.

They have a web based product that allows the user to upload the transcript, and the audio. The software scans for pauses, then does it's best to synch the captions with the audio. Then they prompt the user for massaging the data.

They create CMIL and SAMI files. They also will work on closed captioning for DVDs. His PowerPoint listed a bunch of software, hopefully they'll post that. If they do I'll come back and link to it.

Their current criteria are that the auto-synching be within a ½ second of the actual audio.

They also have the option to auto-segment auio in five second increments. This is useful for more conversational speaking, Q&A, anything that isn't more like a "read"/prepared presentation. This also works for people without transcripts, as they can listen to each five second segment, and then type in the caption directly. They are working on increasing/decreasing segment length, as well as remove or insert segments. They want to have the segments hit the next "probable" speech pause instead, so the arbitrary time based cut-off doesn't clip the middle of a sentence or word. Right now the IITAA doesn't require this for course content- only publically available content. Right now, classroom content is only required to be accessible if a member of the class needs and requests it.

They have this working for both flash and Quicktime. They also want to incorporate additional audio data. Contact presenter at dpharvey@eiu.edu

Slides will be available online at http://www.eiu.edu/~cats/iwaac/

Q: Can anyone access/use this now? A: We hope so soon. They are looking for the best ways for distribute this- maybe a web app people log in and use. But maybe they'll go the downloadable app route.

Q: Data on cost per minute to do the synchronized captioning? Right now their tool is quicker than real time (working with compiled C code to be faster than say php). Definitely cheaper than any professional service, and they hope to provide this service for free. Most tools require you sit through and DOI this manually, even with a transcript, it takes at least (and usually more than) real time. 20 minute clips takes >= 20- minutes to synch. Only 1 in 10 times does it require mediation on the auto-synching.

Q: Anything there to [couldn't hear this question] A. No, this is geared primarily for speech, not video.

Q: Can it accept a file, or do you need to type it in? You can just upload it with the media file.

Speaker: Angie Anderson, Accessible Media Service, DRES, UIUC

They've only been doing this for about 2 years now, and they are focusing on IITAA compliance. They are trying to educate the faculty about the need to caption. They have run in to a few instructors who are very resistant to captioning video, and they've been using those need cases (disabled user in the class) to show instructors on why there is a needs for this [just beyond search improvement?]. They currently have one academic hourly doing most of the captioning. She is working 40 hours/ week and is constantly busy. They are currently working with the Office of Public Affairs to come up with a campus wide policy on captioning, which will help get faculty aware of what their responsibilities are. But some of them don't even know what markup format/accessibility improvements will work in their smart classrooms (just how to turn the caption option on for a DVD). It's important to keep data on how long it takes, so you can get more money from you director (which you will need later). They also use a lot of student workers. Make friends with people on campus that create a lot of video, like Colleen or Liam from ATLAS. Professors are going to provide videos in lots of different formats, created with a myriad of tools- some that are very odd and hard to deal with. They ran in to a few formats that the captioning software couldn't use. They use windows movie maker (standard on window computers). They always add a captioned by plug at the beginning or end of the video so people know who's doing it on campus. They also use Express Scribe with Word Perfect. It can extract audio form a video for you, and use a foot pedal to get the transcript to stop/start. Their main software for captioning ois from CPC. It's high end and expensive. Their version is $5000. But they've paid for that many times over in the last two years. The software is great, and the support has been outstanding- early on just trying to figure out how to do video captioning- more a user issue. The only problem is that it can't import flash videos. They mostly use AVI or Windows Media files. They try to stick to 25-30 characters per captions link. They also try to use markup stamps (music icon) and speaker change indication as well. The most prevalent format right now on campus if flash. Right now the export the data to XML, then use Adobe (currently CS3, soon CS4) to stick the captions in to the Flash files and synch them.

The national standard for captioning is about 6-8 hours per hour of audio/video. But with the transcript provided, the turnaround time is closer to an hour (or close to the real time of the actual video) Most time it takes a day per video.

About 90% of their time is spent creating the transcript. So they use lots of students. They do community service hours work sometimes to get free transcription as well.

Q: What are other tools you've tried and maybe haven't decided to use? What about Dragon Naturally Speaking for instance? Something for people that are only doing captioning sporadically? A. A lot of people on campus have tried this and not been happy with it.

Q: Data on cost per hour? Yes, we keep it, but don't have it on hand. Last semester the captioned about 400 videos, some 3 mintues clips, some hour long ones.

Q: Moving from captions to subtitling? A. Not yet, but considering it. Many professors don't like to see always on screen subtitles.

Q: Dragon for parroting? They haven't tried it, but they feel that they can type faster than they can talk.

Speaker: Liam from ATLAS

Normally, if they have a video that needs captioning, then the go to Angie at DRES. If it's too complex, then they jump back in. They often require users to contract with a third-party to create the transcript. Then they work from that. Their tool [missed name, but locally developed, so it may not have one- they also use Encore] takes the {usually garbage) text file, and dumps out segments of the appropriate character length for captions. Then it prompts the user to correct the mistakes. This then dumps out a GFXP.xml file with time codes. Then a third program allows you to synch up those captions with segments of audio/video.

They are working hard at making a player to show the multi-media with captions. They want captions and scene description [descriptive audio] There are currently two good players available.; one is made by WBGH and one is by Ohio State. They are both based on the GW player (?), an open source product.

For the scene description, the best practice is currently to have a second mp3 file playing (to describe the content of a PowerPoint slide, for instance). But they'd prefer to integrate it into a single file, or at least make the player that can play/control them separately (and, for instance, pause the main AV when the descriptive track has content to play).

Prototype at http://flash.atlas.illinois.edu/Prototype.html . Liam couldn't show it because the network blocks the port it needs.