pdf to text filter using ctx_doc

currently i using a program to filter the text from pdf.

create or replace directory pdf_dir as '&1';
create or replace directory l_curr_dir as '&3';
declare
ll_clob CLOB;
l_bfile BFILE;
l_filename  VARCHAR2(200) := '&2';
begin
begin
ctx_ddl.drop_preference('testfilter');
ctx_ddl.drop_policy('testdimac_policy1');
exception
when others then
null;
end;

ctx_ddl.create_preference('testfilter','AUTO_FILTER');
ctx_ddl.create_policy('testd_policy1','testfilter');

l_bfile := bfilename('PDF_DIR',l_filename);
dbms_lob.fileopen(l_bfile);
ctx_doc.policy_filter(
      policy_name => 'test_policy1'
     , document => l_bfile
     , restab => ll_clob
     , plaintext => true
        ,CHARSET => 'US7ASCII'
     );
DBMS_XSLPROCESSOR.clob2file (ll_clob,'L_CURR_DIR' , '&4');
/

the solution is good and working for me, but is there any way to get the tabular data, right now it
is filtering text phrase by phrase or line by line. for ex.
if pdf contains values like

Name: Amount
Pradeep 100 USD

i want the output as it is but the current setup gives the output like
Name:
Amount
Pradeep
100 USD

is there any way to get the original format of text with in pdf?

can anyone help if i need to change the filter? or is it possible?


Source: oracle

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.