Hey peeps.
I want to read in a HTML page using PHP but instead of outputting it the same i want to esentially remove any fancy stuff and output it as simple HTML 1 / text on the screen.
Any ideas of how i could achieve this?
Cheers
Jon
Printable View
Hey peeps.
I want to read in a HTML page using PHP but instead of outputting it the same i want to esentially remove any fancy stuff and output it as simple HTML 1 / text on the screen.
Any ideas of how i could achieve this?
Cheers
Jon
I know in ASP you can call a remote File Scripting Object, open the file read each line then parse it and print it out to the screen or save it back to a file. As far as PHP goes, you're going to see if they have any kind of file system objects out there that're built in.
Well, if you think about it, you're looking to "filter out" certain HTML tags, but leave some basic ones in. To filter out all tags, except for a given few, would require some fairly advanced usage of regular expressions.
The biggest complication in my mind would be dealing with HTML tags that are "container" tags, and those that aren't. PHP's preg_replace() would be a good place to start in terms of regular expression replacement/filtering. As far as reading an HTML file, the fopen() function is usually enough to get the job done.
well i managed to read the file in like this
Not pretty but hey..Code:$ret = file($url);
$i = 0;
while($i < count($ret)){
print($ret[$i]);
$i = $i + 1;
}
Okay so regular expressions to alter the contents of this.
Mesa feels the time for some research coming on LOL :eek:
Cheers peeps
Jon
Its really going to depend on how general this input HTML files are going to be. If you're talking about something like replacing "<strong>" with "<b>" its going to be a simple matter of find and replace (and again i'm not familiar with php, but i assume they have some kind of replace function right Grizzly?)
What you need to think about though is was Grizzly was talking about. If you're trying to replace sayyyy... "<div>" tags, you have no idea if the syntax will be "<div>" or "<div style=" or "< div >", if you're trying to be general enough to deal with every possible situation, you'll have to get down to the character level... read in each character at a time addding it to some container variable, removing white space as you go, and compare it to some kind of hash/reference table containing each tag you're replacing.
Can I ask what you're doing this for? Whats the goal?
By using strip_tags() you can strip out unwanted tags and the second argument allows you to place tags that you want to allow in.
So..
Oh yes, and foreach() is better than while() when dealing with arrays, if you want cleaner code.PHP Code:$ret = file($url);
$allow = "<b><u><i><a>";
foreach ($ret as $value) {
echo strip_tags($value, $allow); /* Echos the file with all tags stripped except the good ones in $allow */
}
hey guys, cheers for all the help here :)
What i am trying to do is essentially make a portal for a mate to view pages in html1 rather than higher level browsers. Really was more of a see if i can get it working type thing, but failed after a short way through LOL :D
i found this which i thought would do what i wanted..
http://www.jazarsoft.com/products/2.php
but i couldn't figure out where to add the allow tags etc. Have asked them how to do it so waiting for a response as we speak.
cheers Jedi Legend did what you said and it is working so that is cool... still be interested to see what jazarsoft say tho to get that html parser working as that looked quite funkeh :D
Going to keep fiddling to see if i can make the output more readable/prettier
Laters
Jon