You are a machine learning specialist working for a language translation department of a major university.

Your university has developed a mobile/web app that translates across different languages.

You are now in the process of adding some of the more obscure languages in the far north area of the Arctic, such as Inuktun, Nganasan, and Dolgan.

These languages are spoken by very few people in their regions so you have had to build your own data sources of the language patterns for each region. Your machine learning team has decided to use Amazon Kendra to build an indexed searchable document repository.

Your team needs to use the Kendra service to explore their language data in order to clean the data to prepare it for use in your language translation software.

Your team has created your Kendra index and has added your data sources (HTML files, plain text files, PDFs, Word documents, PowerPoint presentations) in your S3 bucket to your index using the Kendra BatchPutDocument API call.

However, you see in your CloudWatch logs an HTTP status code of 400 and some of your documents have not been successfully indexed. What could be the source of the indexing failure?

Question

You are a machine learning specialist working for a language translation department of a major university.

Your university has developed a mobile/web app that translates across different languages.

You are now in the process of adding some of the more obscure languages in the far north area of the Arctic, such as Inuktun, Nganasan, and Dolgan.

These languages are spoken by very few people in their regions so you have had to build your own data sources of the language patterns for each region. Your machine learning team has decided to use Amazon Kendra to build an indexed searchable document repository.

Your team needs to use the Kendra service to explore their language data in order to clean the data to prepare it for use in your language translation software.

Your team has created your Kendra index and has added your data sources (HTML files, plain text files, PDFs, Word documents, PowerPoint presentations) in your S3 bucket to your index using the Kendra BatchPutDocument API call.

However, you see in your CloudWatch logs an HTTP status code of 400 and some of your documents have not been successfully indexed. What could be the source of the indexing failure?

Exam-Answer · Accepted Answer

The text extracted from an individual Word document exceeds 5 MB

Indexing Failure with Amazon Kendra

Question

Answers

Explanations