I have 4GB ram.
running solr on 3GB memory.
I am extracting text and meta data using apache-tika server (tika-server.jar).
Files are taking much longer time than usual. 20 MB file is taking 2 – 3 minutes.
My server is hosted on amazon cloud. running ubuntu 14.04.
I have tested this on my local machine it extracts the data from same file in 1-2 secs.
is there a special configuration needed for amazon cloud instance. My local machine also has 4GB ram but its a MAC OS.
I am using tika-python to index my documents.
I have around 1 million documents in different file formats (pdf,htlm,doc,ppt,xml,txt)
Please suggest a remedy or an alternative solution to Apache-Tika.