開発ブログ

2022.09.06

DeepL翻訳をPythonから使う（第2回）

この記事ではDeepL翻訳のAPIの使い方の第2回目、ファイル翻訳についてご紹介します。（第1回目はこちら）
※PythonとLinuxの初歩的な知識を前提としています。

テキストファイルの翻訳

ウォーミングアップが終わったので、次にファイル翻訳をやってみましょう。
こちらは、３つのAPI呼び出しを組み合わせる必要があります。
適当な日本語が入ったテキストファイル test.txt を用意してください。
それを英語（米）に翻訳し test.trans.txt として保存する仕組みを考えます。

ステップ１：原言語ファイルのアップロード

DeepLのサーバに翻訳リクエストを出す際に、原言語ファイルを指定します。

 $ curl https://api.deepl.com/v2/document \

 $ -F file=@test.txt \

 $ -F auth_key=${auth_key} \

 $ -F target_lang=en-us

翻訳リクエストを受け付けたファイルの document_id と document_key が、
やはりjson形式でサーバから返却されます。
送信までの処理をPythonで書くと次のようになります：

 url = 'https://api.deepl.com/v2/document'

 files = dict()

 files['file'] = open(fn, 'rb')

 files['auth_key'] = (None, get_key())

 files['target_lang'] = (None, 'en-us')

 res = requests.post(url, files=files)

auth_key と target_lang の指定がかなりトリッキーですね。
filesの項目指定にはいくつかの方法があり、２項tupleの場合、
１項目はファイル名、２項目はオブジェクトです。
文字列を指定する場合は、このような書き方になります。
この他のやり方として、auth_key と target_lang は、
最初の文字列翻訳と同様、data（dict形式）の方に格納して、
requests.post()に data と files を両方渡す方法もあります。

ステップ２：翻訳処理状況の取得

翻訳中のファイルの翻訳ステータスを問い合わせます：

 $ document_id=[documentID]

 $ document_key=[documentKey]

 $ curl https://api.deepl.com/v2/document/${document_id} \

 $ -d auth_key=${auth_key} \

 $ -d document_key=${document_key}

ステータスは、queued（受け付けた）、translating（翻訳中）、
error（翻訳エラー発生）、done（翻訳終了）の４つです。
translating の場合は、処理の残り時間の目安も返却されます。
変数 document_id と document_key の代入を済ませたとして、
送信までの処理をPythonで書くと次のようになります：

 url = f'https://api.deepl.com/v2/document/{document_id}'

 data = dict()

 data['auth_key'] = get_key()

 data['document_key'] = document_key

 res = requests.post(url, data=data)

ステップ３：目的言語ファイルのダウンロード

翻訳ステータスが done になったら、ファイルをダウンロードします：

  $ curl https://api.deepl.com/v2/document/${document_id}/result \

 $ -d auth_key=${auth_key} \

 $ -d document_key=${document_key} > test.trans.txt

ダウンロードは一度のみです。
翻訳結果を test.trans.txt にリダイレクト（保存）しています。
送信までの処理をPythonで書くと次のようになります：

 url = f'https://api.deepl.com/v2/document/{document_id}/result'

 data = dict()

 data['auth_key'] = get_key()

 data['document_key'] = document_key

 res = requests.post(url, data=data)

ファイル翻訳処理

最後に、以上の処理を繋げます。
ステップ２は翻訳の進捗状況が返って来るだけなので、
その結果を受けた待ち処理が必要です。基本的には：
・translating の場合は残り時間 seconds_remaining の
　秒数だけsleepして再度進捗状況を確認します。
・done か error が出たら、ループを抜けます。
以上の処理をPythonで書くと次のようになります：

 import requests

 import json

 from time import sleep

def get_key():
return open(‘key.txt’).read().rstrip()

def upload_src(fn):
    ”’
    ファイル翻訳ステップ１：原言語ファイルのアップロード
    ”’
    url = ‘https://api.deepl.com/v2/document’
    files = dict()
    files[‘file’] = open(fn, ‘rb’)
    files[‘auth_key’] = (None, get_key())
    files[‘target_lang’] = (None, ‘en-us’)

    res = requests.post(url, files=files)
    res_status = res.status_code # 成功なら200（今は使わない）
    res_text = res.text
    res_data = json.loads(res_text)
    document_id = res_data[‘document_id’]
    document_key = res_data[‘document_key’]
    return document_id, document_key

def get_trans_status(document_id, document_key):
    ”’
    ステップ２のsub：翻訳処理状況の取得
    ”’
    url = f’https://api.deepl.com/v2/document/{document_id}’
    data = dict()
    data[‘auth_key’] = get_key()
    data[‘document_key’] = document_key

    res = requests.post(url, data=data)
    res_status = res.status_code # 成功なら200（今は使わない）
    res_text = res.text
    res_data = json.loads(res_text)
    return res_data

def check_proceeding(document_id, document_key):
    ”’
    ファイル翻訳ステップ２：翻訳処理待ち
    ”’
    while True:
        res = get_trans_status(document_id, document_key)
        status = res[‘status’]
        print(f’status: {status}’, flush=True)
        seconds_remaining = 0
        if status == ‘done’ or status == ‘error’:
            break
        elif status == ‘translating’:
            if ‘seconds_remaining’ in res:
                seconds_remaining = int(res[‘seconds_remaining’])
                # エラー回避
                if seconds_remaining <= 0:
                    seconds_remaining = 10
        else: # queued など
            pass
        print(f’…waiting for (another) {seconds_remaining}s’, flush=True)
        sleep(seconds_remaining)
    return status

def download_tgt(fn, document_id, document_key):
    ”’
    ファイル翻訳ステップ３：目的言語ファイルのダウンロード
    ”’
    url = f’https://api.deepl.com/v2/document/{document_id}/result’
    data = dict()
    data[‘auth_key’] = get_key()
    data[‘document_key’] = document_key

    res = requests.post(url, data=data)
    res_status = res.status_code # 成功なら200（今は使わない）
    tgt_bin = res._content
    with open(fn, ‘w’, encoding=’utf-8′) as f:
        print(tgt_bin.decode(‘utf-8’), end=”, file=f)

def main():
fn_src = ‘test.txt’
fn_tgt = fn_src.replace(‘.txt’, ‘.trans.txt’)

print(f’fn_src: {fn_src}’)
print(f’fn_tgt: {fn_tgt}’)

    print(f’uploading: {fn_src}’)
    document_id, document_key = upload_src(fn_src)
    status = check_proceeding(document_id, document_key)
    if status == ‘done’:
        print(f’downloading: {fn_tgt}’)
    download_tgt(fn_tgt, document_id, document_key)

if __name__ == ‘__main__’:
main()

３種類のAPI呼び出しは似たような処理ですが、
「curlコマンドをPythonで書く」というテーマのために
あえてまとめずに冗長な書き方をしています。

以上、この記事では：

・DeepLのAPIを使用して、ファイルを翻訳しました。
・curlコマンドの内容をPythonで表現してプログラムを作成しました。

お読みいただきありがとうございます。