Aioforum.com's Free Rapidshare Downloads All in one Premium Account

Your Link here @20$ Free Dedicated Rapidshare Premium Account Your Link here @20$ Join NFO Competition
Go Back   Home > Aio Forums Zone > Tutorial Section > Programming > ASP
Better than google adsense
Forgot Password? Join Us!

Notices

Your Ad Here


Post New Thread Reply
 
LinkBack Thread Tools Display Modes
Old 05-23-2008, 07:06 AM   #1
 
hacks's Avatar

 
User Info
Join Date: Oct 2007
Send a message via Yahoo to hacks
Achievements Posts: 1,797
Casino Cash: $214880

Total Points: 784,866.63
Donate

Reputation: 368263
hacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond repute


Awards Showcase
Member of the Month 
Total Awards: 1
Talking Get All URLs on a Page

In this article, I show a class that can be used to find and display all of the urls on a web page. What for you may ask? Well, in my experience as a web developer, I have found a class like this to be very useful. Sometimes, you may want to use this class a a basis for a more complex application that crawls your site checking for bad or broken links. In other cases, you may simply want to check an individual page to make sure your links are formatted correctly, or don’t contain any obsolete pages. You could also easily change this class to look for other items within your page, like specific text or tags. Who knows, this may be the start of a specialized spider that crawls sites on the internet looking for something specific.

I think you get the picture. Of course, to make this class do all those wonderful things, you would have to expand on what I am presenting here. However, I believe this is a good start. The class has one public method - RetrieveUrls. The method calls two private methods. The RetrieveContents method will issue a request to the web page, and retreive the contents. The GetAllUrls method will use a regular expression to find all of the urls on the page. This method writes the matches to the screen, as well as saving them in a log file. Of course, if you prefer, you could modify the method to save the matches somewhere else, like an array or a database table.

Using the code
Code:
</p>
<p>GetUrls urls = new GetUrls();</p>
<p>urls.RetrieveUrls(”http://www.microsoft.com”);</p>
<p>
The class is listed below. Have fun!

Code:
<br />
using System;<br />
using System.Collections.Generic;<br />
using System.Text;<br />
using System.Net;<br />
using System.IO;<br />
using System.Text.RegularExpressions;</p>
<p>namespace FindAllUrls<br />
{<br />
class GetUrls<br />
{</p>
<p>//public method called from your application<br />
public void RetrieveUrls( string webPage )<br />
{<br />
GetAllUrls(RetrieveContent(webPage));<br />
}</p>
<p>//get the content of the web page passed in<br />
private string RetrieveContent(string webPage)<br />
{<br />
HttpWebResponse response = null;//used to get response<br />
StreamReader respStream = null;//used to read response into string<br />
try<br />
{<br />
//create a request object using the url passed in<br />
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(webPage);<br />
request.Timeout = 10000;</p>
<p>//go get a response from the page<br />
response = (HttpWebResponse)request.GetResponse();</p>
<p>//create a streamreader object from the response<br />
respStream = new StreamReader(response.GetResponseStream());</p>
<p>//get the contents of the page as a string and return it<br />
return respStream.ReadToEnd();<br />
}<br />
catch (Exception ex)//houston we have a problem!<br />
{<br />
throw ex;<br />
}<br />
finally<br />
{<br />
//close it down, we’re going home!<br />
response.Close();<br />
respStream.Close();<br />
}<br />
}</p>
<p>//using a regular expression, find all of the href or urls<br />
//in the content of the page<br />
private void GetAllUrls( string content )<br />
{<br />
//regular expression<br />
string pattern = @”(?:href\s*=)(?:[\s""']*)(?!#|mailto|location.|javascript|.*css|.*this\.)(?<br />
.*?)(?:[\s>""'])”;</p>
<p>//Set up regex object<br />
Regex RegExpr = new Regex(pattern, RegexOptions.IgnoreCase);</p>
<p>//get the first match<br />
Match match = RegExpr.Match(content);</p>
<p>//loop through matches<br />
while (match.Success)<br />
{</p>
<p>//output the match info<br />
Console.WriteLine(”href match: ” + match.Groups[0].Value);<br />
WriteToLog(”C:\matchlog.txt”, “href match: ” + match.Groups[0].Value + “\r\n”);</p>
<p>Console.WriteLine(”Url match: ” + match.Groups[1].Value);<br />
WriteToLog(”C:\matchlog.txt”, “Url | Location | mailto match: ” + match.Groups[1].Value + “\r\n”);</p>
<p>//get next match<br />
match = match.NextMatch();<br />
}<br />
}</p>
<p>//Write to a log file<br />
private void WriteToLog(string file, string message)<br />
{<br />
using (StreamWriter w = File.AppendText(file))<br />
{<br />
w.WriteLine(DateTime.Now.ToString() + “: ” + message); w.Close();<br />
}<br />
}<br />
}<br />
}<br />

hacks is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
The Following User Says Thank You to hacks For This Useful Post:
barnick (06-28-2008)
Click here to Donate to remove the Adverts.
Old 09-03-2008, 11:03 AM   #2
 
chan001's Avatar

 
User Info
Join Date: Sep 2008
Location: Philippines
Age: 20
Send a message via Yahoo to chan001
Achievements Posts: 29
Casino Cash: $10100

Total Points: 1,944.53
Donate

Reputation: 30
chan001 is on a distinguished road


what this stand for? can give a more clarify works infos?

Thanks and advance i wan to know this...
chan001 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Click here to Donate to remove the Adverts.
Old 09-17-2008, 10:06 AM   #3
 
hacks's Avatar

 
User Info
Join Date: Oct 2007
Send a message via Yahoo to hacks
Achievements Posts: 1,797
Casino Cash: $214880

Total Points: 784,866.63
Donate

Reputation: 368263
hacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond reputehacks has a reputation beyond repute


Awards Showcase
Member of the Month 
Total Awards: 1
basic tools of using image file uploading from one website to other in easy way . no need to save the image into harddisk to use into other website.

hacks is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Click here to Donate to remove the Adverts.
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Web page maker 2.5 Mr-R-T Appz Zone 0 03-10-2008 11:31 AM
msn phishing page kinRez Hacking Request 0 02-27-2008 05:27 PM
How do you get just the URLs for Google search results? Steven A Google 0 08-12-2007 06:35 PM
how do i get rid of a stale page in my... pashby38 Wide Area Networking and Internet Access 0 07-25-2007 05:02 AM
Is there a way to download a file of page URL - page... L.M.A Google 0 07-03-2007 07:37 AM


These are the 125 most used thread tags
Tag Cloud
(2008) 0 1 1cd 2 3 3gp 4 7 10 2005 2006 2007 2008 2009 ac3 adobe advanced aio antivirus appz audio beta build business cd christmas collection complete converter crack desktop direct download dvd dvdrip dvdscr earth edition eng exclusive files final flash format free full game games genuine guide happy hdtv hosts hq incl internet joomla kaspersky keygen link links mac manager media microsoft mobile movie movies mp3 music network office original pack patch pc photo photoshop platinum player portable premium pro professional quality rapidshare reloaded rip rscom s60v3 security serial server smartmovie software songs sp1 speed studio subs subtitles suite tamil telugu template tm tools ultimate update utilities version video videos vista wallpapers web windows working world wwe x264 xp xvid |

New To AiO Forum? Need Help?

Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.

Site Best Viewed with Firefox 3.0 & IE v7.0
RapidShare Links PhazeDDL Warez
Full DownloadsAuto Submitter