[英] 看看 Netflix 官方团队是怎么支持日文字幕的

593 阅读11分钟
原文链接: medium.com

Implementing Japanese Subtitles on Netflix

Japanese subtitles were first made available on the Netflix service as a part of the Japanese launch in September 2015. This blog post provides a technical description of the work we did leading up to this launch. We cover topics including our specification for source subtitle files, the transformation model from source subtitle files to those deliverable on the Netflix service, the model for delivery of Japanese subtitles on our service as well as the interaction of our work with the upcoming W3C subtitle standard, Timed Text Markup Language 2 (TTML2).

Towards the end of 2014, we were working on the technical features for the planned September 2015 launch of Netflix in Japan. At the time, we were mindful that other streaming services that were operating in the Japanese market had received criticism for providing a substandard subtitle experience. Armed with this knowledge, and our desire to maintain the high Netflix standard of quality, we made a decision to implement rigorous support for all of the “essential” Japanese subtitle features — those that would be customary in a premium Japanese video service. This was on top of our following existing subtitle requirements:

  • Subtitles must be delivered to clients separate from the video (i.e. no burned-in subtitles); and
  • All subtitle sources must be delivered to Netflix in text format, in order to be future-proof.

Essential Japanese Subtitle Features

Overview

Through a combination of market research, and advice from Japanese language and media experts, we identified five essential Japanese subtitle features. These features (described below) include rubies, boutens, vertical text, slanted text, and “tate-chu-yoko” (horizontal numbers in vertical text). From our perspective, this prelude was indicative of the complexity of the challenge we had at our hand.

Rubies

Ruby annotations describe the base text that they are associated with. They help explain the meaning of unfamiliar, foreign, or slang words/phrases AND/OR convey pronunciation of kanji characters that are rare and/or unknown. They can help provide cultural context to a translation which allows the viewer to enjoy the content with a deeper understanding. Common practice in subtitling is to display rubies in a font size that is smaller relative to the base text and to place them above the base character for single-line subtitles, and for the first line of two-line subtitles. Rubies are placed below the base character if appearing on the second line of a two-line subtitle. Rubies should never be placed between two lines as it is difficult to discern which base character they should be associated with. Figure 1 shows a ruby example for the dialogue “All he ever amounted to was chitlins.”

Figure 1: A “ruby” example

The base text translates the word “chitlins”*, while the ruby provides transliteration of the word “chitlins” so that viewers can more closely associate the keyword of the dialogue to the translation. As mentioned above, rubies should never be placed in between two lines. Figure 2 shows the proper placement for rubies with two-line subtitles. In the unlikely event that a subtitle spans 3 lines, it is preferable to have the rubies on top of each line, except for the last line where they should be at the bottom.

Figure 2: Proper “ruby” placement for a two-line subtitle

Boutens

Boutens are dots placed above or below a word or phrase that act as literal points of emphasis, equivalent to the use of italics in English. Boutens can help express implied meanings which provide a richer and more dynamic translation. Figure 3 shows a bouten example for the dialogue: “I need someone to talk to.”

Figure 3: A “bouten” example

This subtitle has boutens above the word for “talk”. In the context of this scene, placing emphasis on this word allows the viewer to understand the implication that the speaker needs someone to provide him/her with privileged information.

Vertical Subtitles

Vertical subtitles are generally used to avoid overlap with on-screen text present in the video. This is the Japanese equivalent to putting subtitles at the top of the screen. This is illustrated in Figure 4.

Figure 4: A vertical subtitle co-existing with on-screen credits

Tate-chu-yoko

In Japanese typography, vertical text often includes short runs of horizontal numbers or Latin text. This is referred to as tate-chu-yoko. Instead of stacking the characters vertically, half-width characters are placed side-by-side to enhance legibility and allow more characters to be placed on a single subtitle line. This is illustrated in the Figure 5, for the dialogue, “It’s as if we are still 23 years old”. In this example, subtitle, the number “23” uses half-width numeric characters, and employ the tate-chu-yoko functionality.

Figure 5: A vertical subtitle with a horizontal run of numbers

Slanted Text

Slanted text is used in similar fashion as italics/oblique text in other languages — for narration, off-screen dialogue, and forced narratives. One unique feature in Japanese subtitles however is that italics slant is in different directions for horizontal vs. vertical subtitles; furthermore, the angle of the slant is not necessarily constant, but may vary. This is illustrated in Figure 6 and Figure 7.

Figure 6: Example of horizontal slanted text
Figure 7: Example of vertical slanted text

Sourcing Japanese Subtitles

Subtitle assets in the entertainment industry are primarily present in one of two forms — structured text/binary files or rendered images. Netflix has always required the former for its content ingestion system. There are several reasons for this requirement. First, different clients have different subtitle capabilities, requiring us to be able to produce many variations of client assets from a single source. In addition, text subtitle sources are future-proof. That is, as new device capabilities emerge, we can apply those to our large back-catalog of subtitle assets. As an example, when displaying subtitles on an HDR device playing HDR content, it is useful to specify the luminance gain so that white text is not a max-white specular highlight. With text sources, we can easily go back and reprocess to produce subtitles for a client profile that supports luminanceGain. If we had ingested image sources, on the other hand, it would have been difficult to add this sort of functionality to client assets. Further, image assets are opaque while text assets are a lot more amenable for searchability and natural language processing based analysis purposes.

With text sources as a “must-have” requirement, we reviewed the available options for Japanese, and Videotron Lambda (also called ‘LambdaCap’ format) was chosen as the only workable model for Japanese subtitles. There were several reasons for this decision. From our analysis, we determined that the LambdaCap format:

  • is reasonably open, allowing us to develop our own tools and workflows.
  • is currently the most common subtitle format that Japanese subtitle tools can support. This was a key driver of our decision because it meant that the established Japanese subtitle industry could produce subtitles for Netflix.
  • is the most common archive format for existing Japanese subtitles. Another key driver, because supporting LambdaCap meant that we could ingest existing assets without any transformation requirements.
  • supports the essential Japanese features as described above.
  • has been widely used in the industry to create image-based subtitle files for burn-in. Thus, it is well-tested.

Although we chose the Videotron Lambda model for the Japanese launch, it did not seem like a great long-term option. It is not a de jure industry standard, and there are some ambiguities in the specification. The LambdaCap format supports the essential Japanese subtitling features very well but lacks in some of the rudimentary features supported in Web Platform standards such as TTML1. Examples of such features include color, font information, and also various layout and composition primitives. In addition, we chose to not use LambdaCap as the delivery model to the playback devices in the Netflix eco-system. Concurrently, the timed text working group (TTWG) was working on the second version of the TTML standard (TTML2). One of the stated objectives of TTML2 was the ability to support global subtitles — Japanese subtitles being a key use case. This became a basis for us to collaborate with TTWG on the TTML2 standardization effort including help complete the specification using our experience as well as the implementations described below. TTML2 became the canonical representation for all source formats in our subtitle processing pipeline.

Mapping of Japanese Features to TTML2

Table 1 summarizes the mapping between the essential Japanese subtitling features described above and the constructs offered by TTML2. It also shows the usage statistics for these features across the Netflix Japanese subtitles catalog as well as the preferred mode for their usage in the Netflix eco-system. The other features not yet used or not used significantly are expected to be used more widely in the future†. The following sections provide details on each feature, in particular regarding the supported values.

Table 1: A summary of mapping between Japanese subtitling features and TTML2 styling constructs

Rubies

tts:ruby

This styling attribute specifies structural aspects of ruby content including mechanisms that define carriage of ruby base text as well as ruby annotations. The range of values associated with tts:ruby map to corresponding HTML markup elements. As shown in the associated TTML snippet, a ruby “container” markup encompasses both ruby base and annotation text, while ruby “base” and “text” respectively markup base text and annotation text. Figure 8 shows the expected rendering associated with this snippet.

<?xml version="1.0" encoding="utf-8"?>
<tt xmlns="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:frameRate="30" ttp:frameRateMultiplier="1000 1001" ttp:version="2" tts:extent="1280px 720px" xml:lang="ja">
 <head>
  <styling>
   <initial tts:fontSize="6.0vh" tts:lineHeight="7.5vh" tts:showBackground="whenActive" tts:textOutline="black 0.1em"/>
   <style xml:id="s2" tts:textAlign="start"/>
   <style xml:id="s3" tts:ruby="container"/>
   <style xml:id="s4" tts:ruby="base"/>
   <style xml:id="s5" tts:ruby="text" tts:rubyPosition="before"/>
   <style xml:id="s7" tts:textAlign="center"/>
   <style xml:id="s8" tts:ruby="text" tts:rubyPosition="after"/>  
  </styling>
  <layout>
   <region xml:id="横下" tts:displayAlign="after" tts:extent="80vw 30vh" tts:position="center bottom 10vh"/>
  </layout>
 </head>
 <body region="横下" xml:space="preserve">
  <div>
   <p begin="00:04:30:13" end="00:04:32:18" style="s7"><span style="s2"><span style="s3"><span style="s4">太孫</span><span style="s5">たいそん</span></span>のペクチョンを連れ<br/><span style="s3"><span style="s4">北漢</span><span style="s8">プッカン</span></span>山に登り</span></p>
  </div>
 </body>
</tt>
Figure 8: A rendering corresponding to above TTML snippet

tts:rubyPosition

This styling attribute specifies positioning of rubies in the block progression dimension relative to the base text. We observed that for the Japanese subtitles use case, tts:rubyPosition=“top” and tts:rubyPosition=“bottom” are less than ideal because they do not provide for unanticipated word wrapping, in which case the second line of text should ideally have rubies below. True to its name, the behavior of tts:rubyPosition=“auto” automatically covers these semantics. This is illustrated in the accompanying TTML snippet. Figure 9 illustrates the expected rendering associated with this snippet. We also note that the behavior of “auto” is currently only specified for exactly two-line events, and will not cover the use case of unanticipated line break on the second line of a two-line event. We believe that that the current behavior described in TTML2 for “outside” is the correct model, and perhaps “auto” could be retired in favor of “outside”.

<?xml version="1.0" encoding="UTF-8"?>
<tt xmlns="http://www.w3.org/ns/ttml" xmlns:tt="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:tickRate="10000000" ttp:version="2" xml:lang="ja">
 <head>
  <styling>
   <initial tts:backgroundColor="transparent" tts:color="white" tts:fontSize="6.000vh"/>
   <style xml:id="style0" tts:textAlign="center"/>
   <style xml:id="style1" tts:textAlign="start"/>
   <style xml:id="style2" tts:ruby="container" tts:rubyPosition="auto"/>
   <style xml:id="style3" tts:ruby="base"/>
   <style xml:id="style4" tts:ruby="text"/>
   <style xml:id="style5" tts:ruby="text"/>
  </styling>
  <layout>
   <region xml:id="region0" tts:displayAlign="after"/>
  </layout>
  </head>
 <body xml:space="preserve">
  <div>
   <p xml:id="subtitle1" begin="18637368750t" end="18676157500t" region="region0" style="style0"><span style="style1">テソプ<span style="style2"><span style="style3">の所だ</span><span style="style4">カン食ン</span></span>食おう<br/><span style="style2"><span style="style3">江陵</span><span style="style5">カンヌン</span></span>で刺身でも食おう</span></p>
  </div>
 </body>
</tt>
Figure 9: A rendering corresponding to above TTML snippet

tts:rubyAlign

This styling attribute specifies the position of ruby text within the inline area that is generated by the ruby container. Given our experience with Japanese subtitles, prefered value of tts:rubyAlign is “center”.

<?xml version="1.0" encoding="utf-8"?>
<tt xmlns="http://www.w3.org/ns/ttml" xmlns:tt="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:tickRate="10000000" ttp:version="2" xml:lang="ja">
 <head>
  <styling>
   <initial tts:backgroundColor="transparent" tts:color="white" tts:fontSize="6.000vh" tts:lineHeight="7.500vh" tts:opacity="1.000" tts:showBackground="whenActive" tts:writingMode="lrtb" tts:rubyAlign="center"/>
   <style xml:id="style0" tts:textAlign="center"/>
   <style xml:id="style1" tts:textAlign="start"/>
   <style xml:id="style2" tts:ruby="container"/>
   <style xml:id="style3" tts:ruby="base"/>
   <style xml:id="style4" tts:ruby="text" tts:rubyPosition="auto"/>
  </styling>
  <layout>
   <region xml:id="region0" tts:displayAlign="after" tts:extent="80.000% 30.000%" tts:position="center bottom 10vh" tts:showBackground="whenActive"/>
  </layout>
 </head>
 <body xml:space="preserve">
  <div>
   <p xml:id="subtitle1" begin="18637368750t" end="18676157500t" region="region0" style="style0"><span style="style1">テソプ<span style="style2"><span style="style3">の所だ</span><span style="style4">カン食ン</span></span>食おう</span></p>
   <p xml:id="subtitle2" begin="18676991666t" end="18717031666t" region="region0" style="style0"><span style="style1">テソプ<span style="style2"><span style="style3">の所だ</span><span style="style4">カン食ンカン食ン</span></span>食おう</span></p>
  </div>
 </body>
</tt>

The illustrations below were obtained from the above TTML snippet and they serve to describe the behavior in two cases, when the base text is wider than the ruby text and vice-versa. In both cases, the base text corresponds to ‘の所だ’ (3 Unicode characters) and the ruby alignment value is “center”.

Case 1
In this case (shown in Figure 10), the rendered width of the ruby text is smaller than that of the base text.

Figure 10: Rendered width of ruby text is smaller than base text

Case 2
In this case (shown in Figure 11), the width of the ruby text is greater than the width of the base text. We note that in both the cases, the ruby text is centered with respect to the base text.

Figure 11: Rendered width of ruby text is larger than base text

tts:rubyReserve

The intent of this feature is to maintain temporal consistency in placement of the base text along the block progression direction as we move from subtitles with only base text to those with base text that is annotated with rubies (and vice versa). We note that this feature can also be used to preserve base text alignment across time when boutens are used.

<?xml version="1.0" encoding="utf-8"?>
<tt xmlns="http://www.w3.org/ns/ttml" xmlns:tt="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:tickRate="10000000" ttp:version="2" xml:lang="ja">
 <head>
  <styling>
   <initial tts:backgroundColor="transparent" tts:color="white" tts:fontSize="6.000vh" tts:lineHeight="7.500vh" tts:opacity="1.000" tts:showBackground="whenActive" tts:writingMode="lrtb" tts:rubyReserve="auto"/>
   <style xml:id="style0" tts:fontShear="16.78842%" tts:textAlign="center"/>
   <style xml:id="style1" tts:fontShear="16.78842%" tts:textAlign="start"/>
   <style xml:id="style2" tts:fontShear="16.78842%" tts:ruby="container"/>
   <style xml:id="style3" tts:fontShear="16.78842%" tts:ruby="base"/>
   <style xml:id="style4" tts:fontShear="16.78842%" tts:ruby="text" tts:rubyPosition="auto"/>
  </styling>
  <layout>
   <region xml:id="region0" tts:displayAlign="after" tts:extent="80.000% 30.000%" tts:position="center bottom 10vh" tts:showBackground="whenActive"/>
  </layout>
 </head>
 <body xml:space="preserve">
  <div>
   <p xml:id="subtitle1" begin="18623187916t" end="18635700416t" region="region0" style="style0"><span style="style1">“行き先は?”</span></p>
   <p xml:id="subtitle2" begin="18637368750t" end="18676157500t" region="region0" style="style0"><span style="style1">“テソプの所だ<br/><span style="style2"><span style="style3">江陵</span><span style="style4">カンヌン</span></span>で刺身でも食おう”</span></p>
  </div>
 </body>
</tt>