|
-
Using PHP to read a HTML page and output modified HTML page?
Hey peeps.
I want to read in a HTML page using PHP but instead of outputting it the same i want to esentially remove any fancy stuff and output it as simple HTML 1 / text on the screen.
Any ideas of how i could achieve this?
Cheers
Jon
-
Tiger Shark
I know in ASP you can call a remote File Scripting Object, open the file read each line then parse it and print it out to the screen or save it back to a file. As far as PHP goes, you're going to see if they have any kind of file system objects out there that're built in.
XP 1600+
MSI KT333
512 PC2700
AOpen Geforce4 4200
-
Ursus Arctos Moderatis
Well, if you think about it, you're looking to "filter out" certain HTML tags, but leave some basic ones in. To filter out all tags, except for a given few, would require some fairly advanced usage of regular expressions.
The biggest complication in my mind would be dealing with HTML tags that are "container" tags, and those that aren't. PHP's preg_replace() would be a good place to start in terms of regular expression replacement/filtering. As far as reading an HTML file, the fopen() function is usually enough to get the job done.
-
well i managed to read the file in like this
Code:
$ret = file($url);
$i = 0;
while($i < count($ret)){
print($ret[$i]);
$i = $i + 1;
}
Not pretty but hey..
Okay so regular expressions to alter the contents of this.
Mesa feels the time for some research coming on LOL 
Cheers peeps
Jon
-
Tiger Shark
Its really going to depend on how general this input HTML files are going to be. If you're talking about something like replacing "<strong>" with "<b>" its going to be a simple matter of find and replace (and again i'm not familiar with php, but i assume they have some kind of replace function right Grizzly?)
What you need to think about though is was Grizzly was talking about. If you're trying to replace sayyyy... "<div>" tags, you have no idea if the syntax will be "<div>" or "<div style=" or "< div >", if you're trying to be general enough to deal with every possible situation, you'll have to get down to the character level... read in each character at a time addding it to some container variable, removing white space as you go, and compare it to some kind of hash/reference table containing each tag you're replacing.
Can I ask what you're doing this for? Whats the goal?
XP 1600+
MSI KT333
512 PC2700
AOpen Geforce4 4200
-
Expensive Sushi
By using strip_tags() you can strip out unwanted tags and the second argument allows you to place tags that you want to allow in.
So..
PHP Code:
$ret = file($url);
$allow = "<b><u><i><a>";
foreach ($ret as $value) {
echo strip_tags($value, $allow); /* Echos the file with all tags stripped except the good ones in $allow */
}
Oh yes, and foreach() is better than while() when dealing with arrays, if you want cleaner code.
Last edited by Jedi Legend; 03-06-2003 at 05:45 PM.
-
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|