Using PHP to read a HTML page and output modified HTML page?

Sharky Forums


Results 1 to 7 of 7

Thread: Using PHP to read a HTML page and output modified HTML page?

  1. #1
    Catfish
    Join Date
    Sep 2000
    Location
    Wales, UK
    Posts
    140

    Question Using PHP to read a HTML page and output modified HTML page?

    Hey peeps.

    I want to read in a HTML page using PHP but instead of outputting it the same i want to esentially remove any fancy stuff and output it as simple HTML 1 / text on the screen.

    Any ideas of how i could achieve this?

    Cheers
    Jon

  2. #2
    Tiger Shark GrayCalx's Avatar
    Join Date
    Nov 2000
    Location
    Northern VA
    Posts
    569
    I know in ASP you can call a remote File Scripting Object, open the file read each line then parse it and print it out to the screen or save it back to a file. As far as PHP goes, you're going to see if they have any kind of file system objects out there that're built in.
    XP 1600+
    MSI KT333
    512 PC2700
    AOpen Geforce4 4200

  3. #3
    Ursus Arctos Moderatis Grizzly's Avatar
    Join Date
    Sep 2000
    Location
    Providence, RI USA
    Posts
    3,077
    Well, if you think about it, you're looking to "filter out" certain HTML tags, but leave some basic ones in. To filter out all tags, except for a given few, would require some fairly advanced usage of regular expressions.

    The biggest complication in my mind would be dealing with HTML tags that are "container" tags, and those that aren't. PHP's preg_replace() would be a good place to start in terms of regular expression replacement/filtering. As far as reading an HTML file, the fopen() function is usually enough to get the job done.

  4. #4
    Catfish
    Join Date
    Sep 2000
    Location
    Wales, UK
    Posts
    140
    well i managed to read the file in like this

    Code:
    $ret = file($url);
     $i = 0;
     while($i < count($ret)){
       print($ret[$i]);
       $i = $i + 1;
     }
    Not pretty but hey..

    Okay so regular expressions to alter the contents of this.

    Mesa feels the time for some research coming on LOL

    Cheers peeps
    Jon

  5. #5
    Tiger Shark GrayCalx's Avatar
    Join Date
    Nov 2000
    Location
    Northern VA
    Posts
    569
    Its really going to depend on how general this input HTML files are going to be. If you're talking about something like replacing "<strong>" with "<b>" its going to be a simple matter of find and replace (and again i'm not familiar with php, but i assume they have some kind of replace function right Grizzly?)

    What you need to think about though is was Grizzly was talking about. If you're trying to replace sayyyy... "<div>" tags, you have no idea if the syntax will be "<div>" or "<div style=" or "< div >", if you're trying to be general enough to deal with every possible situation, you'll have to get down to the character level... read in each character at a time addding it to some container variable, removing white space as you go, and compare it to some kind of hash/reference table containing each tag you're replacing.

    Can I ask what you're doing this for? Whats the goal?
    XP 1600+
    MSI KT333
    512 PC2700
    AOpen Geforce4 4200

  6. #6
    Expensive Sushi Jedi Legend's Avatar
    Join Date
    Mar 2003
    Location
    Kansas, United States
    Posts
    19
    By using strip_tags() you can strip out unwanted tags and the second argument allows you to place tags that you want to allow in.

    So..

    PHP Code:
    $ret   file($url);
    $allow "<b><u><i><a>";

    foreach (
    $ret as $value) {

    echo 
    strip_tags($value$allow); /* Echos the file with all tags stripped except the good ones in $allow */


    Oh yes, and foreach() is better than while() when dealing with arrays, if you want cleaner code.
    Last edited by Jedi Legend; 03-06-2003 at 05:45 PM.

  7. #7
    Catfish
    Join Date
    Sep 2000
    Location
    Wales, UK
    Posts
    140
    hey guys, cheers for all the help here

    What i am trying to do is essentially make a portal for a mate to view pages in html1 rather than higher level browsers. Really was more of a see if i can get it working type thing, but failed after a short way through LOL

    i found this which i thought would do what i wanted..
    http://www.jazarsoft.com/products/2.php

    but i couldn't figure out where to add the allow tags etc. Have asked them how to do it so waiting for a response as we speak.

    cheers Jedi Legend did what you said and it is working so that is cool... still be interested to see what jazarsoft say tho to get that html parser working as that looked quite funkeh

    Going to keep fiddling to see if i can make the output more readable/prettier

    Laters
    Jon

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •