Gary Browne
2009-09-01 05:55:11 UTC
Hi all,
I have a query about searching of pdf documents which I can't seem to
find a definitive answer for:
When a user searches via the dspace web interface, is the search run
across the content of text pdfs or just the metadata? If so, does the
pdf submitted to the repository need to have been previously OCR'd, or
does the repository attempt to extract & index text from all pdfs?
Any information regarding this would be greatly appreciated.
Thanks
Gary
Gary Browne
Development Programmer
Library IT Services
University of Sydney
ph: 9351-5946
Sent from my plain old desktop computer.
I have a query about searching of pdf documents which I can't seem to
find a definitive answer for:
When a user searches via the dspace web interface, is the search run
across the content of text pdfs or just the metadata? If so, does the
pdf submitted to the repository need to have been previously OCR'd, or
does the repository attempt to extract & index text from all pdfs?
Any information regarding this would be greatly appreciated.
Thanks
Gary
Gary Browne
Development Programmer
Library IT Services
University of Sydney
ph: 9351-5946
Sent from my plain old desktop computer.