If you can read this, either the style sheet didn't load or you have an older browser that doesn't support style sheets. Try clearing your browser cache and refreshing the page.

(IT World)   New tool lets you search website source code, kill yet more time at work   (itworld.com) divider line 47
    More: Interesting, search engines, Google Analytics, plain text, source codes, comparative  
•       •       •

2671 clicks; posted to Geek » on 08 Oct 2013 at 10:44 AM (49 weeks ago)   |  Favorite    |   share:  Share on Twitter share via Email Share on Facebook   more»



47 Comments   (+0 »)
   
View Voting Results: Smartest and Funniest
 
2013-10-08 10:20:23 AM
$100/month? No thanks, failmitter.
 
2013-10-08 10:49:35 AM
20 free searches for  the  reading comprehension affected.
//not subby
 
2013-10-08 10:54:50 AM
Is this different from Chrome's "View Source" or "Inspect Element?"
 
2013-10-08 10:55:58 AM
Ah.  I think I see.  Not sure if I'd need it, though.
 
2013-10-08 11:02:56 AM

gnosis301: Is this different from Chrome's "View Source" or "Inspect Element?"


Yes. This costs $100.
 
2013-10-08 11:05:31 AM
Ummm..I can do this for free using Google's "intext" search operator.

For example, to find TFA by searching it's META tag, do the following Google search:

intext:meta Google Adsense, Adsense, Twitter, GitHub, Google BigQuery, BigQuery

TFA is the first link.


The rest of the google search operators
 
2013-10-08 11:06:50 AM

Gonz: gnosis301: Is this different from Chrome's "View Source" or "Inspect Element?"

Yes. This costs $100.


"View Source" and "Inspect Element" are not search tools.
 
2013-10-08 11:14:59 AM

fatbear: Ummm..I can do this for free using Google's "intext" search operator.

For example, to find TFA by searching it's META tag, do the following Google search:

intext:meta Google Adsense, Adsense, Twitter, GitHub, Google BigQuery, BigQuery

TFA is the first link.


The rest of the google search operators


If you drop the intext operator, it's still the first hit.  It's pretty easy to find something with a search engine when you're actually have the page content right in front of you, as it happens.

Also, you have to either use allintext or put intext in front of every term - the only term you applied the intext operator to was "meta".

That said, intext does exactly what it says: It searches the text.  This tool is completely different.  It searches the source code.  If you were searching for sites that used, say, a a particular jquery function, this would find them.
 
2013-10-08 11:15:39 AM
China White Tea: <stuff>

And, wow, that was a comedy of typographical errors.

To the coffee machine!
 
2013-10-08 11:23:55 AM

show me: $100/month? No thanks, failmitter.


For a full time web developer $100 a month is nothing
 
2013-10-08 11:24:22 AM

olapbill: 20 free searches for  the  reading comprehension affected.
//not subby


I saw that. Many computers come with a trial version of MS Office that works for a short time without registration. Does that mean Office is free? You, sir, are a dumbass.
 
2013-10-08 11:27:20 AM

China White Tea: fatbear: Ummm..I can do this for free using Google's "intext" search operator.

For example, to find TFA by searching it's META tag, do the following Google search:

intext:meta Google Adsense, Adsense, Twitter, GitHub, Google BigQuery, BigQuery

TFA is the first link.


The rest of the google search operators

If you drop the intext operator, it's still the first hit.  It's pretty easy to find something with a search engine when you're actually have the page content right in front of you, as it happens.


Well, duh. It was just an example.

Also, you have to either use allintext or put intext in front of every term - the only term you applied the intext operator to was "meta".

True.

That said, intext does exactly what it says: It searches the text.  This tool is completely different.  It searches the source code.  If you were searching for sites that used, say, a a particular jquery function, this would find them.

Ummmm....intext: *does* search the source, because the source is text. Example:
Search forintext:"bsa.async = true"

I understand that this tool will search *only* the source and will ignore content, but I can still get similar results for $0/month instead of $100.
 
2013-10-08 11:36:25 AM
Hell, we've been able to 'view source' with all browsers for years.

fatbear: Gonz: gnosis301: Is this different from Chrome's "View Source" or "Inspect Element?"

Yes. This costs $100.

"View Source" and "Inspect Element" are not search tools.


You're right, but if you are looking for something in the code of a website, 'View Source' and ctrl-f work fine
 
2013-10-08 11:38:06 AM

fatbear: Search forintext:"bsa.async = true"


That mostly returns things like stackoverflow.  My first page of Google results for that criteria are:

Css-tricks.com
thematictheme.com (a web themes page with sample code).
macdrifter.com, the actual displayed page is source code.
A beaconads.com page with sample code.
A stackoverflow page with sample code.

And so on and so forth.

Searching  "bsa.async = true" with the source search tool returns sites that actually have that in their source code.

By comparison, this google query:

 site:www.googleguide.com intext:"favicon.ico" 

Will return no results, even though "favicon.ico" does appear in the source for that site.
 
2013-10-08 11:46:03 AM
View Source + ctrl+F

/Profit!
 
2013-10-08 11:48:21 AM

fatbear: Gonz: gnosis301: Is this different from Chrome's "View Source" or "Inspect Element?"

Yes. This costs $100.

"View Source" and "Inspect Element" are not search tools.


Ctrl+F, now cough up a hundred bucks.
 
2013-10-08 11:48:23 AM

show me: olapbill: 20 free searches for  the  reading comprehension affected.
//not subby

I saw that. Many computers come with a trial version of MS Office that works for a short time without registration. Does that mean Office is free? You, sir, are a dumbass.


..and your mother smells of elderberries.
and  the headline doesn't match your complaint anymore.
or mine.
 
2013-10-08 12:07:23 PM

dennysgod: fatbear: Gonz: gnosis301: Is this different from Chrome's "View Source" or "Inspect Element?"

Yes. This costs $100.

"View Source" and "Inspect Element" are not search tools.

Ctrl+F, now cough up a hundred bucks.


If you can use that technique to find me 100 sites that use the same snippet of code in 15 seconds or less, I'll give you  $1000.
 
2013-10-08 12:09:18 PM

abhorrent1: View Source + ctrl+F

/Profit!


Fail!
 
2013-10-08 12:11:34 PM
Damn.  I had this idea years ago.  Never did anything with it.
 
2013-10-08 12:17:07 PM

China White Tea: fatbear: Search forintext:"bsa.async = true"

That mostly returns things like stackoverflow.  My first page of Google results for that criteria are:

Css-tricks.com
thematictheme.com (a web themes page with sample code).
macdrifter.com, the actual displayed page is source code.
A beaconads.com page with sample code.
A stackoverflow page with sample code.

And so on and so forth.

Searching  "bsa.async = true" with the source search tool returns sites that actually have that in their source code.

By comparison, this google query:

 site:www.googleguide.com intext:"favicon.ico" 

Will return no results, even though "favicon.ico" does appear in the source for that site.


I learned something new today.

/Still think $100/mo is overpriced.
 
2013-10-08 12:25:20 PM

fatbear: /Still think $100/mo is overpriced.


No argument from me on that one.  I am definitely not their target market, although I can think of some people who might be.

Malware research comes to mind as a possible use.  If you find some malicious code, it could be useful to have a tool like this to find other instances of it, etc.
 
2013-10-08 12:27:20 PM
This looks like a great idea poorly implemented.  Subscribing for $99+ dollars for a limited number of searches is hard to swallow.  It's hard to imagine $99 of worth from the service to justify paying for it.  I couldn't even find specifics for just what the plans provide, and there's no way I'm providing my private information to set up a trial account to find out.
 
2013-10-08 12:28:48 PM

China White Tea: fatbear: /Still think $100/mo is overpriced.

No argument from me on that one.  I am definitely not their target market, although I can think of some people who might be.

Malware research comes to mind as a possible use.  If you find some malicious code, it could be useful to have a tool like this to find other instances of it, etc.


Or, perhaps a bit more nefariously, if have an exploit targeting, say, a certain version of something like Wordpress, you could also use this to search the version number (or some other identifying feature) in source code to locate potential targets.
 
2013-10-08 12:35:51 PM
Rajish?
 
2013-10-08 12:45:36 PM

fatbear: Search forintext:"bsa.async = true"


I get a bunch of pages explaining the use of "bsa.async".
 
2013-10-08 12:46:04 PM
Web sites are notorious for stealing their layout and javascript code from other websites, even when such are copyrighted.

This tool could be used for tracking down infringers.

On the other hand if there is someway to compromise the security of a website that depends upon particular code, then this will help find targets.
 
2013-10-08 01:19:48 PM
Also keep in mind that there are two different types of "source" code.

If you do a View Source on this page, you'll get the actual HTML source. It will include all of the text of these comments, the usernames that posted them, the dates and times they were posted on, the ads and featured links over to the right, and some information not normally visible. This is the client-side code.

But no developer actually typed up all of that. How could a developer know ahead of time what you and I would be typing in here? How could he know ahead of time what ads to display, or does s/he hand-modify the code every time anyone posts or a new ad is purchased?

No, what the developer typed was some PHP or Perl-CGI or some other server language code that sets up a basic template for a "comment" and an "ad" as well as structure for the menus, the "Also on Fark" links at the bottom, etc.. This server-sidecode will contain the HTML to emit (once for such repeating elements as comments, ads, "Also on Fark" links, etc.), but also other stuff as well (e.g.. the actual code to perform loops to repeat those items as needed with the same structure but different data) that does not show up when you do "View Source." In fact, I know of no way to reliably get access to this code via a web browser unless you know the FTP/SFTP or SSH login and password.

From what the article describes, NerdyData only searches the client-side code. Yes, Google can do that, too (I used it once to find info on some hack code that had appeared in a website, and instead of finding information about the hack itself, I found tens of thousands of websites that had been hacked the same way and thus contained the same code [including at least one well-known web security company!]).

If they had the ability to search the server-side code, that would be very impressive and well worth the money. It would also very much qualify this link for the "Scary" tag, as it'd be a bonanza for hackers, intellectual property thieves pirating how your cool website works, and corporate espionage ― not to mention raising some serious questions as to how that would even be possible, and what it says about the security of web servers in general.
 
2013-10-08 01:50:43 PM

China White Tea: China White Tea: fatbear: /Still think $100/mo is overpriced.

No argument from me on that one.  I am definitely not their target market, although I can think of some people who might be.

Malware research comes to mind as a possible use.  If you find some malicious code, it could be useful to have a tool like this to find other instances of it, etc.

Or, perhaps a bit more nefariously, if have an exploit targeting, say, a certain version of something like Wordpress, you could also use this to search the version number (or some other identifying feature) in source code to locate potential targets.


In your example, I would use Google to search for Wordpress sites and then a very short script on my end could spider and parse those results for the version number. Not only that, the nefarious script would ignore robots.txt.

I seriously doubt they have any "nefarious" clients. They're not in the habit of paying for anything.
 
2013-10-08 02:24:04 PM

fatbear: Not only that, the nefarious script would ignore robots.txt.



It seems pretty unlikely that robots.txt will matter in most cases.  You're using a script to process results from Google, so the two most likely scenarios are "They have a robots.txt that excludes it from being crawled, so it's not in the result set your script is going to parse, because it wasn't on Google in the first place," or, "They have no such exclusion, so it wouldn't have interfered anyway."  

Also, not sure if you've ever run into it, but Google  does have anti-bot code that you're at least moderately likely to encounter if you're trying to scrape a few hundred thousand results from it.

Not saying this is the only way to do it, but $100 isn't so bad when you're putting it on someone else's credit card, and 650,000 resultsfor "Wordpress 3.5.2" in <1 second with a handy "download entire resultset as a list" button is something that could definitely have a market.
 
2013-10-08 02:50:12 PM
I would search for MD5 codes if they allowed regex which the article says they don't.

I'd like to have a database of MD5 codes that appear on internet sites around the US (or world).

Pretty sure the NSA already has those databases along with insurance companies and perhaps advertisers.
 
2013-10-08 03:07:59 PM
I'm failing to see how that would be useful.
 
2013-10-08 03:22:57 PM

Honest Bender: I'm failing to see how that would be useful.


Say I was a health insurance company (maybe pre-Obamacare), and I had a million customers some with various ailments, some of those ailments undisclosed at the time of their application, and I wanted to find out which of those customers had visited various websites to discuss:

HIV
mental illness
meth, pot, heroin, adderall

Maybe I was an employer and wanted to know the same things. Maybe I was the FBI and wanted to track visitors to gun forums or infowars.

There's a huge and maybe minor maybe not privacy leak on the net and its name is Gravatar and its implementation is MD5 of "your email address". People that have your email address (health insurance companies, employers, relatives, NSA, FBI) can md5 it and find forums you "gravitate" to if you provided the same email address.  People that don't have your email address can still create networks of MD5 appearances and find that anonymous guy at torrent site is actually Drew Curtis who used the same email address one day at Kickstarter or Change.org or some such place where he was less anonymous.

Right now, you cannot google the source code of a website, so you have to build your own engine to do this. I've thought about building a Wordpress or gravatar engine for this purpose. But these guys with their 140 million websites, if they scraped blogs and wordpress sites and getsatisfaction etc have all this stuff in their databases.

You can google gravatar privacy leak to find out a bit more. Some folks think it is a big leak. The company, wordpress, denies it is a leak at all. Stackexchange, IIRC, chose their own system as they felt it was a privacy leak (I may be very wrong on that.)
 
2013-10-08 03:32:15 PM

RoyBatty: Honest Bender: I'm failing to see how that would be useful.

Say I was a health insurance company (maybe pre-Obamacare), and I had a million customers some with various ailments, some of those ailments undisclosed at the time of their application, and I wanted to find out which of those customers had visited various websites to discuss:

HIV
mental illness
meth, pot, heroin, adderall

Maybe I was an employer and wanted to know the same things. Maybe I was the FBI and wanted to track visitors to gun forums or infowars.

There's a huge and maybe minor maybe not privacy leak on the net and its name is Gravatar and its implementation is MD5 of "your email address". People that have your email address (health insurance companies, employers, relatives, NSA, FBI) can md5 it and find forums you "gravitate" to if you provided the same email address.  People that don't have your email address can still create networks of MD5 appearances and find that anonymous guy at torrent site is actually Drew Curtis who used the same email address one day at Kickstarter or Change.org or some such place where he was less anonymous.

Right now, you cannot google the source code of a website, so you have to build your own engine to do this. I've thought about building a Wordpress or gravatar engine for this purpose. But these guys with their 140 million websites, if they scraped blogs and wordpress sites and getsatisfaction etc have all this stuff in their databases.

You can google gravatar privacy leak to find out a bit more. Some folks think it is a big leak. The company, wordpress, denies it is a leak at all. Stackexchange, IIRC, chose their own system as they felt it was a privacy leak (I may be very wrong on that.)


What the hell are you babbling about?  Just MD5 your email address if you want your email address as MD5.

MD5 isn't farking magic.
 
2013-10-08 03:41:24 PM
Is the F12 key?
 
2013-10-08 03:49:42 PM
Find an online md5 calculator, say this one:

http://md5-hash-online.waraxe.us/

Type in an email address you like to use on forums.

Calculate the md5

Go to nerdy data, https://search.nerdydata.com and search for that md5, tell us what happens.

I found several of my forum posts.
 
2013-10-08 03:58:30 PM

China White Tea: $100 isn't so bad when you're putting it on someone else's credit card,


I overlooked the obvious.
 
2013-10-08 04:40:44 PM
ctrl+u
 
2013-10-08 04:44:02 PM

RoyBatty: Find an online md5 calculator, say this one:

http://md5-hash-online.waraxe.us/

Type in an email address you like to use on forums.

Calculate the md5

Go to nerdy data, https://search.nerdydata.com and search for that md5, tell us what happens.

I found several of my forum posts.


Yeah no shiat dummy.  So what?

Yes any doofus can make an "MD5 database".  Coming up with email addresses is easy because you just need to do something like "roy­b­atty[nospam-﹫-backwards]lia­m­g­*com" and then MD5 it.
 
2013-10-08 04:56:53 PM

Shazam999: RoyBatty: Find an online md5 calculator, say this one:

http://md5-hash-online.waraxe.us/

Type in an email address you like to use on forums.

Calculate the md5

Go to nerdy data, https://search.nerdydata.com and search for that md5, tell us what happens.

I found several of my forum posts.

Yeah no shiat dummy.  So what?

Yes any doofus can make an "MD5 database".  Coming up with email addresses is easy because you just need to do something like "r­o­ybatt­y­[nospam-﹫-backwards]liamg*com" and then MD5 it.


Yeah, don't worry about it, I was just gaslighting you, it's not a problem.
 
2013-10-08 05:23:06 PM
RoyBatty: Bla Bla MD5 Bla Bla

I still don't see how this search tool would be useful.  Like, at all.
 
2013-10-08 05:28:49 PM
Has anyone ever made money selling services to malicious hackers? That's like selling CDs on Pirate Bay.
 
2013-10-08 05:37:16 PM
View Source

Control F.

Don't charge me for that shiat.
 
2013-10-08 05:46:31 PM
This sounds absolutely boring, whether or not you pay for it.

The only time I've ever viewed sourcecode was to figure out how to download something they didn't want downloaded, and even there I've found better ways (image capture, video capture, etc.)

I stopped mucking with code in the early 90s. I let the nerds do that boring crap.
 
2013-10-08 05:56:11 PM

Honest Bender: RoyBatty: Bla Bla MD5 Bla Bla

I still don't see how this search tool would be useful.  Like, at all.


Well, I'll repeat one example I happen to like, although after 1/1/2014 it won't be relevant (I hope).

So you buy insurance from AETNA and you don't tell AETNA you have a back problem. Six months go by and you ask AETNA to pay for lumbar fusion surgery, which google tells me costs $90K.

And it's not just you, AETNA thinks that 10% of its customers are lying to them.

When you bought insurance at AETNA you signed up at their online service, using the email address honest­*b­en­d­er­[nospam-﹫-backwards]li­amg­*c­o­m.

When you visit http://backpainsurviving.com/  you see an article you really like and you enter a comment.

i.imgur.com
That site, like so many is a wordpress site. The CEO of wordpress honestly wants wordpress to power a majority of all sites.

http://techcrunch.com/2013/09/18/mullenweg-wants-wordpress-to-power- a- majority-of-all-websites/

Wordpress tells you your email address will never be published, which is good because you know you lied about your pre-existing condition.

BUT that gravatar next to your name is an MD5 hash of honest­*b­en­d­er­[nospam-﹫-backwards]li­amg­*c­o­m and that is published. It's virtually certain that that MD5 hash uniquely identifies your email address, and if you were like many people, you entered your real email address. Not all people do. Shazzam clearly never does. I try not to. But sometimes I do if I think the blogger might want to contact me.

So one day Cindy in IT realizes that AETNA could create a web crawler to crawl all medical sites or any support group site for any kind of medical issue and compare the MD5 hashes in the blog comments with the md5 hashes of the email address of AETNA customers and so build a shadow profile of what diseases are associated with each customer.

They do this and find that you have been commenting on the back pain support site for almost a decade and possible back pain is clearly indicated in your shadow profile.

So when you come to them to ask for lumbar fusion surgery, they know that there is a good chance that is a pre-existing condition you lied about and so they investigate the sites comments a bit, and eventually decline your surgery.

I really hope this is pointless after 1/1/2014, but I think it's a real thing that can be done with MD5 hashes and insurance companies today.

Say I am just a bad guy, I'm an asshole, and I run across you on some wordpress site and you really get on my nerves. And I think, I want to fark that guy over, I know his name is Honest Bender, I wonder is his email address:

honest­*b­en­d­er­[nospam-﹫-backwards]li­amg­*c­o­m
hon­estb­ender[nospam-﹫-backwards]liam­g­*co­m
hben­der[nospam-﹫-backwards]l­ia­m­g­*co­m
hon­es­t*be­n­der­[nospam-﹫-backwards]tops­t­o­h*c­o­m
hb­en­der[nospam-﹫-backwards]tops­toh*c­om
h­ben­der­[nospam-﹫-backwards]oohay*com
...

Well I can actually figure that out by sticking your name in an email generator spreadsheet (https://docs.google.com/spreadsheet/ccc?key=0AoW7aksoVU98dGNFSUtfeXg 4a kpNTWM0Z2pHWjJzZUE#gid=0 ) and then searching / comparing it with the md5 hash in the wordpress site's page.

Now I have your real email address. Maybe I use that to spam you. Maybe I use that as one piece in stealing your identity.

There are a lot of other nasty things that can be done with that as well.

Google gravatar privacy leak and you'll find various complaints.

Again, some people think this is a big deal, I lean that way. Some people including Wordpress (which owns Gravatar) think it's nothing whatsoever.)


http://www.developer.it/post/gravatars-why-publishing-your-email-s-h as h-is-not-a-good-idea
 
2013-10-08 07:21:36 PM

RoyBatty: Honest Bender: RoyBatty: Bla Bla MD5 Bla Bla

I still don't see how this search tool would be useful.  Like, at all.

Well, I'll repeat one example I happen to like


I understand your example. You don't have to explain it to me again.  But since you repeated yourself, let me respond again:

I still don't see how this search tool would be useful. At all.
Your example does not change my opinion.
 
2013-10-08 07:29:06 PM

Honest Bender: RoyBatty: Honest Bender: RoyBatty: Bla Bla MD5 Bla Bla

I still don't see how this search tool would be useful.  Like, at all.

Well, I'll repeat one example I happen to like

I understand your example. You don't have to explain it to me again.  But since you repeated yourself, let me respond again:

I still don't see how this search tool would be useful. At all.
Your example does not change my opinion.


I apologize for wasting your time, twice now.
 
Displayed 47 of 47 comments

View Voting Results: Smartest and Funniest


This thread is closed to new comments.

Continue Farking
Submit a Link »






Report