Using PHP to read a HTML page and output modified HTML page?

Printable View

03-05-2003, 08:01 AM
LaughingJon

Using PHP to read a HTML page and output modified HTML page?

Hey peeps.

I want to read in a HTML page using PHP but instead of outputting it the same i want to esentially remove any fancy stuff and output it as simple HTML 1 / text on the screen.

Any ideas of how i could achieve this?

Cheers
Jon
03-05-2003, 03:11 PM
GrayCalx

I know in ASP you can call a remote File Scripting Object, open the file read each line then parse it and print it out to the screen or save it back to a file. As far as PHP goes, you're going to see if they have any kind of file system objects out there that're built in.
03-05-2003, 08:02 PM
Grizzly

Well, if you think about it, you're looking to "filter out" certain HTML tags, but leave some basic ones in. To filter out all tags, except for a given few, would require some fairly advanced usage of regular expressions.

The biggest complication in my mind would be dealing with HTML tags that are "container" tags, and those that aren't. PHP's preg_replace() would be a good place to start in terms of regular expression replacement/filtering. As far as reading an HTML file, the fopen() function is usually enough to get the job done.
03-06-2003, 05:05 AM
LaughingJon

well i managed to read the file in like this

Code:

$ret = file($url); $i = 0; while($i < count($ret)){ print($ret[$i]); $i = $i + 1; }

Not pretty but hey..

Okay so regular expressions to alter the contents of this.

Mesa feels the time for some research coming on LOL :eek:

Cheers peeps
Jon
03-06-2003, 04:54 PM
GrayCalx

Its really going to depend on how general this input HTML files are going to be. If you're talking about something like replacing "<strong>" with "<b>" its going to be a simple matter of find and replace (and again i'm not familiar with php, but i assume they have some kind of replace function right Grizzly?)

What you need to think about though is was Grizzly was talking about. If you're trying to replace sayyyy... "<div>" tags, you have no idea if the syntax will be "<div>" or "<div style=" or "< div >", if you're trying to be general enough to deal with every possible situation, you'll have to get down to the character level... read in each character at a time addding it to some container variable, removing white space as you go, and compare it to some kind of hash/reference table containing each tag you're replacing.

Can I ask what you're doing this for? Whats the goal?
03-06-2003, 05:42 PM
Jedi Legend

By using strip_tags() you can strip out unwanted tags and the second argument allows you to place tags that you want to allow in.

So..

PHP Code:

$ret = file($url); $allow = "<b><u><i><a>"; foreach ($ret as $value) { echo strip_tags($value, $allow); /* Echos the file with all tags stripped except the good ones in $allow */ }

Oh yes, and foreach() is better than while() when dealing with arrays, if you want cleaner code.
03-07-2003, 07:58 AM
LaughingJon

hey guys, cheers for all the help here :)

What i am trying to do is essentially make a portal for a mate to view pages in html1 rather than higher level browsers. Really was more of a see if i can get it working type thing, but failed after a short way through LOL :D

i found this which i thought would do what i wanted..
http://www.jazarsoft.com/products/2.php

but i couldn't figure out where to add the allow tags etc. Have asked them how to do it so waiting for a response as we speak.

cheers Jedi Legend did what you said and it is working so that is cool... still be interested to see what jazarsoft say tho to get that html parser working as that looked quite funkeh :D

Going to keep fiddling to see if i can make the output more readable/prettier

Laters
Jon