Results 1 to 7 of 7

Thread: crawling pdf files... how???

  1. #1
    Join Date
    Jul 2001
    Location
    jauh nun...
    Posts
    36
    Rep Power
    0

    Question crawling pdf files... how???

    Assalamualaikum...

    sorry... just want to ask one quest... i have done search engine with crawling part... but i wonder to know... how to crawl pdf files since all php scripts crawl URLs thru "href"... it just crawls html files... can someone tell me how??

    - thanks =) -

  2. #2
    Join Date
    Jul 2001
    Location
    OCed
    Posts
    252
    Rep Power
    228
    dont know .. but google can do that .. right ?
    I can't affod to have a signature here, can somebody sponsor me a signature ?

  3. #3
    Join Date
    Jul 2001
    Location
    jauh nun...
    Posts
    36
    Rep Power
    0
    ehehe... not just google my dear... most journals/technical papers search engine can do that... such as cora.whizbang... ncstrl... and others... pdf files or postscripts files are more useful and can be trusted than html files...=)

  4. #4
    Join Date
    Jul 2001
    Location
    OCed
    Posts
    252
    Rep Power
    228
    thehehe ... *i just know google* (because i use it ONLY)

    but since php can *create* pdf file ... so i think there is a way to undo the process (haha .. what am i talking about ...). so we can read it as txt file or anything ...
    I can't affod to have a signature here, can somebody sponsor me a signature ?

  5. #5
    Join Date
    Jul 2001
    Location
    jauh nun...
    Posts
    36
    Rep Power
    0
    i just wonder... what do u mean by creating pdf files ?? can u show me how to do it
    ~ k | r | i | s | t | a | l ^ p | u | t | i | h ~

  6. #6
    Join Date
    Nov 2001
    Location
    MLK
    Posts
    119
    Rep Power
    221
    http://www.php.net/pdf

    there's some manual about how to create a pdf file using pdflib. I'm not sure how to crawl.. maybe one idea i have..

    first u fetch the pdf file to your webserver, and then read and then get the link.. and then i think u can crawl, run, or whatever u think it's suitable

  7. #7
    Join Date
    Jul 2001
    Location
    jauh nun...
    Posts
    36
    Rep Power
    0
    tq 4 ur opinion and help rokawa... about fecth the pdf file... well... that's my prob!!!... how to fecth?? i have done lots of thing with the coding... changing and trial & error... but didn't work...

    ok... lets me explain a bit details... when we crawl a page with default web page... we use href to get the link (URL)... grab it and keep the title, description, and URL (link) into our database... * that's spider job!!! ... when user search any term or keyword... we will give the results based on our data in the database...

    so here... for spider part... i just can fecth html files with .gif... but not pdf files... i still work on it!!! ... anyway thanks again
    Last edited by whit3_cryst4l; 17-04-2002 at 09:43 AM.
    ~ k | r | i | s | t | a | l ^ p | u | t | i | h ~

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. batch files in c++
    By aki86 in forum Mamak Stall
    Replies: 1
    Last Post: 27-03-2009, 10:29 AM
  2. Using Your Log Files for SEO
    By marshtric in forum Search Engine Marketing
    Replies: 0
    Last Post: 07-03-2009, 12:24 PM
  3. Can I prevent Search Engine from crawling into my subdomain?
    By GeminiGeek in forum Search Engine Marketing
    Replies: 11
    Last Post: 03-12-2007, 02:07 AM
  4. Source files
    By vaNko in forum Websites Review and Suggestion
    Replies: 0
    Last Post: 05-09-2005, 11:12 AM
  5. Upload File & Crawling Web Pages to Get pdf + ps files
    By whit3_cryst4l in forum Website Programming
    Replies: 2
    Last Post: 23-10-2002, 10:18 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

Search Engine Optimization by vBSEO 3.5.0 RC1 PL1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32