Firstly, thanks for not giving up on the discussion yet
It's proving a very difficult topic to discuss in this communication medium, but I think we're gradually making progress...
Hmmm... that discussion is in the 'Design and Development' forum... sorry about that.
With Erics permission, I'll move it to the 'Feature Request' forum so you can read it...
Great, thanks.
Ok, you convinced me...
Actually, I can think of something even simpler...
Just add a little checkbox that says something like 'Enable Built-in Anti-Phishing Protection' (with a 'More info' link that will provide detailed information on the algorithm used and how and when it is applied) that is enabled by default for new accounts (but disabled on existing accounts for backwards compatability), that applies the following algorithm to a basic TLD entry like 'yahoo.com':
Only match a properly formatted preceding "https?://*.?" (I think I did that right) with no embedded redirects (or whatever you call them) in the path... so, just putting 'yahoo.com' would all by itself accomplish the same thing as your example - unless I missed something again...
Then, the same algorithm could also be applied to custom regex and wildcard entries (even existing ones), but only in a 'Warning' mode - ie, when you open an existing account and look at the URLs Tab, if any of the existing wildcard/regex entries are unsafe, they would be flagged somehow (but would continue to work as they did previously). This would enable anyone to fix existing broken ones when convenient for them. This would also allow a visual indicator when creating new ones to warn of any typos or unsafe expressions...
Lastly, when the above Checkbox is enabled, the Custom Regex/Wildcard list/management stuff is hidden - when unchecked, it is visible...
I'm struggling to understand the above, sorry - but it sounds like you're still suggesting taking a domain like 'yahoo.com', converting it into a regexp like 'https?://[^/]+\.yahoo\.com/.*', and then matching the URL of the current page against that regexp. But in my opinion this is unnecessarily complicated and the wrong way round. Why? Well, the end goal is to find out whether the URL of the current page corresponds to a particular account, right? So rather than converting the domain of a particular account to something more complicated (a regexp) and then doing a regexp match, it's much easier to convert the current URL to something simpler (a domain) and then do a simple string equality test. And as I said before, this latter conversion is a hashing function which is controlled by the components checkboxes in the Defaults account.
Secondly, regexps are much more complicated to work with than you might think. Analysing whether a particular regexp is subject to phishing attacks is distinctly non-trivial, so I would strongly suggest the following approach:
(1) Greatly reduce the need for any user (even an advanced one) to write regexps
(2) If a user really knows what they are doing and really needs a regexp, then they should be allowed enough rope to hang themself, with appropriate warnings about the risks of phishing.
The first (1) is solved in two ways. Firstly, applying the hashing function to the current URL and comparing it against the automatically generated "Use the following..." value for each account, as described in my 4-step algorithm, eliminates the need for regexps in 90% of cases.
In the other 10%, the user wants a single account to match multiple domains (say). So in addition to the "Use the following..." value which will be used for matching against the domain the account was originally created for, the user needs to be able to specify extra domains which the account will also match.
For instance, let's assume that microsoft.com and msn.com are federated into a single sign-on system, which for the sake of example requires passwords which only contain letters and digits. So the user creates a new account 'microsoft.com' which auto-generates passwords satisfying this constraint. Automatically, the 4-step algorithm ensures that the same password is generated for
http://login1.microsoft.com/foo/bar and
https://login2.microsoft.com/baz/qux. Now the user wants to make this same account match for the msn.com domain. This is where the second solution for the 10% of cases comes in. The user simply adds a new pattern *://*.msn.com/** to the account, which is phishing-proof as per my ** syntax extension suggestion, or if they want, they add a new regexp https?://[^/]\.msn\.com/.* which achieves almost the same thing.
Or, even better, we could extend/change the UI so that in addition to being able to specify glob and regular expression patterns, the user can specify additional domains to do the plain string equality test against - in other words, they are treated identically to the 'Use the following...' value. IMHO this is the safest option and simplest for the user to understand.
The second (2) is solved simply by adding a warning to the UI in the place where new patterns can be added.
And, of course, I posted that *before* I went back and re-read your 4-step algorithm...
I think that your 4-step algorithm is actually accomplishing the same thing, *except* that it depends on the user actually checking the all of the different URL parts.
I don't understand - which URL parts do you mean? I don't see how it depends on the user checking anything.
My method doesn't - it *automatically* applies them as I described - then just compares them to the TLD of the account.
Applies what? Are you talking about that TLD domain->regexp conversion again? If so, recall that I suggest above that IMHO it's unnecessary and over-complicated.
One thing I missed - the 'Use the following URL...' cox would need to be changed to something like 'URL TLD for this Account'. The main issue I see is this field should not be easily editable, because if the user changes it, there would not be a match.
Agreed. Although 'URL TLD' is not correct either, because the user might check the subdomain checkbox in the URL components bit of the Defaults account. In that case, login1.msn.com and login2.msn.com would be deliberately treated differently.