---
license: apache-2.0
language:
- zh
- ja
- ar
- en
- hi
metrics:
- accuracy
library_name: allennlp
---
## Language Identification

该模型是基于 AllenNLP 在 [qgyd2021/language_identification](https://huggingface.co/datasets/qgyd2021/language_identification) 数据集上训练的语种识别模型。


在 valid 验证集上的准确率情况：

| 语种 | 样本数量 |  准确率   |
| :--- | :----: |  ------: |
|  af  |  6221  |  0.8666  |
|  ar  |  19808  |  0.9994  |
|  bg  |  19913  |  0.9958  |
|  bn  |  7396  |  0.9968  |
|  bs  |  1653  |  0.8232  |
|  cs  |  19122  |  0.9615  |
|  da  |  19500  |  0.9727  |
|  de  |  19702  |  0.996  |
|  el  |  19455  |  0.9761  |
|  en  |  39710  |  0.9942  |
|  eo  |  18542  |  0.9944  |
|  es  |  19924  |  0.9937  |
|  et  |  19482  |  0.9727  |
|  fi  |  19223  |  0.9554  |
|  fo  |  4612  |  0.9697  |
|  fr  |  19990  |  0.9957  |
|  ga  |  19949  |  0.9973  |
|  gl  |  508  |  0.822  |
|  hi  |  19984  |  0.9965  |
|  hi_en  |  1358  |  0.951  |
|  hr  |  18840  |  0.9789  |
|  hu  |  669  |  0.8873  |
|  hy  |  124  |  0.9688  |
|  id  |  4669  |  0.9968  |
|  is  |  19795  |  0.9876  |
|  it  |  19742  |  0.9941  |
|  ja  |  20130  |  0.9996  |
|  ko  |  20098  |  0.9998  |
|  lt  |  19280  |  0.9721  |
|  lv  |  19459  |  0.9931  |
|  mr  |  10300  |  0.9961  |
|  mt  |  19708  |  0.993  |
|  nl  |  18452  |  0.9258  |
|  no  |  19404  |  0.9714  |
|  pl  |  19920  |  0.9973  |
|  pt  |  19996  |  0.9946  |
|  ro  |  19804  |  0.9944  |
|  ru  |  20003  |  0.9954  |
|  sk  |  19804  |  0.9861  |
|  sl  |  19665  |  0.9926  |
|  sv  |  18941  |  0.95  |
|  sw  |  19768  |  0.9871  |
|  th  |  19917  |  0.9991  |
|  tl  |  19572  |  0.9991  |
|  tn  |  19883  |  0.9933  |
|  tr  |  19809  |  0.9939  |
|  ts  |  19752  |  0.9854  |
|  uk  |  17643  |  0.9994  |
|  ur  |  19895  |  0.992  |
|  vi  |  19836  |  0.9982  |
|  yo  |  1936  |  0.9827  |
|  zh  |  40108  |  0.9996  |
|  zu  |  5406  |  0.9905  |


测试代码：
```python
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import argparse
import time

from allennlp.models.archival import archive_model, load_archive
from allennlp.predictors.text_classifier import TextClassifierPredictor

from project_settings import project_path


def get_args():
    """
    python3 step_5_predict_by_archive.py
    :return:
    """
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--text",
        default="hello guy.",
        type=str
    )
    parser.add_argument(
        "--archive_file",
        default=(project_path / "trained_models/language_identification").as_posix(),
        type=str
    )
    args = parser.parse_args()
    return args


def main():
    args = get_args()

    archive = load_archive(archive_file=args.archive_file)

    predictor = TextClassifierPredictor(
        model=archive.model,
        dataset_reader=archive.dataset_reader,
    )

    json_dict = {
        "sentence": args.text
    }

    begin_time = time.time()
    outputs = predictor.predict_json(
        json_dict
    )
    label = outputs["label"]
    prob = round(max(outputs["probs"]), 4)
    print(label)
    print(prob)

    print('time cost: {}'.format(time.time() - begin_time))
    return


if __name__ == '__main__':
    main()

```

requirements.txt
```text
allennlp==2.10.1
allennlp-models==2.10.1
torch==1.12.1
overrides==1.9.0
pytorch_pretrained_bert==0.6.2
```