• 스튜디오
  • 모델
  • 요금제
  • 콘솔
  • 로그인
  • 회원가입
서비스이용약관개인정보처리방침환불 정책고객센터사업자정보
English한국어
(주)젠다이브대표이사 ∣ 함민혁사업자번호 ∣ 449-87-02752
개인정보책임자 ∣ 함준혁통신판매업신고번호 ∣ 2025-광주동구-0120
Email ∣ info@gendata.krTel ∣ 070-4895-5550

서울 금천구 가산디지털1로 84, 에이스하이엔드타워8차 3층 310호 젠다이브 기업부설연구소

© Dev Dive, All rights reserved.

Dots.OCR

객체 인식
MIT License
chipRednote

dots.ocr is a powerful, multilingual document parser that unifies layout detection and content recognition within a single vision-language model while maintaining good reading order. Despite its compact 1.7B-parameter LLM foundation, it achieves state-of-the-art(SOTA) performance.

  1. Powerful Performance: dots.ocr achieves SOTA performance for text, tables, and reading order on OmniDocBench, while delivering formula recognition results comparable to much larger models like Doubao-1.5 and gemini2.5-pro.
  2. Multilingual Support: dots.ocr demonstrates robust parsing capabilities for low-resource languages, achieving decisive advantages across both layout detection and content recognition on our in-house multilingual documents benchmark.
  3. Unified and Simple Architecture: By leveraging a single vision-language model, dots.ocr offers a significantly more streamlined architecture than conventional methods that rely on complex, multi-model pipelines. Switching between tasks is accomplished simply by altering the input prompt, proving that a VLM can achieve competitive detection results compared to traditional detection models like DocLayout-YOLO.
  4. Efficient and Fast Performance: Built upon a compact 1.7B LLM, dots.ocr provides faster inference speeds than many other high-performing models based on larger foundations.

Limitations

  1. Complex Document Elements:

    • Table&Formula: dots.ocr is not yet perfect for high-complexity tables and formula extraction.
    • Picture: Pictures in documents are currently not parsed.
  2. Parsing Failures: The model may fail to parse under certain conditions:

    • When the character-to-pixel ratio is excessively high. Try enlarging the image or increasing the PDF parsing DPI (a setting of 200 is recommended). However, please note that the model performs optimally on images with a resolution under 11289600 pixels.
    • Continuous special characters, such as ellipses (...) and underscores (_), may cause the prediction output to repeat endlessly. In such scenarios, consider using alternative prompts like prompt_layout_only_en, prompt_ocr, or prompt_grounding_ocr (details here). Performance Bottleneck: Despite its 1.7B parameter LLM foundation, dots.ocr is not yet optimized for high-throughput processing of large PDF volumes.