Calgarypuck Forums - The Unofficial Calgary Flames Fan Community
Old 03-23-2010, 04:10 PM   #41
FanIn80
GOAT!
 
FanIn80's Avatar
 
Join Date: Jun 2006
Exp:
Default

Quote:
Originally Posted by Mad Mel View Post
Huh. I'm an advocate of implementing constraints on the database so my data integrity isn't dependent on an application.
Oh cool. That sounds like something I'll probably learn about in an upcomming class...

Although, if I understand you correctly, are you saying that there are things that can be done at the DB level to deal with something like what MelB posted above? I mean, besides the only thing my limited skills can come up with, which is just a bunch of update queries.

We covered constraints briefly last week. How would you use one to make sure someone spells the name of a city right?

Last edited by FanIn80; 03-23-2010 at 04:13 PM.
FanIn80 is offline   Reply With Quote
Old 03-23-2010, 04:21 PM   #42
FanIn80
GOAT!
 
FanIn80's Avatar
 
Join Date: Jun 2006
Exp:
Default

I was just thinking about this some more. In the past, I've always put city, province, country names in their own tables. Then I pull the records from each of those tables down into a relevant combo box so people can just select the names from there.

Obviously, I'd give them an add/edit form so they could add the cities, states or countries they needed... The way I looked at it, if the guy entering the city screwed it up, at least it was screwed up consistantly. A quick edit from the edit screen, which called a background update querey to update existing records using that name... and voilà!

Don't get me wrong, I'm sure this is all pretty amatuerish. I'm actually starting to really look forward to some of these upcoming classes.
FanIn80 is offline   Reply With Quote
Old 03-23-2010, 04:41 PM   #43
MelBridgeman
Lifetime Suspension
 
Join Date: Mar 2007
Location: Calgary
Exp:
Default

The software used by the realtors to enter this data is provided by the MLS company, I just get a daily file from them (in this case anyways) that I import into the database. Not sure why this company would allow input into a text field, instead of a combo box...

But i deal with actually way more MLS companies accross Canada and this is the only company that provides the data like this.
MelBridgeman is offline   Reply With Quote
Old 03-23-2010, 05:33 PM   #44
Mad Mel
First Line Centre
 
Mad Mel's Avatar
 
Join Date: Mar 2009
Location: Brisbane, Australia
Exp:
Default

Quote:
Originally Posted by FanIn80 View Post
I was just thinking about this some more. In the past, I've always put city, province, country names in their own tables. Then I pull the records from each of those tables down into a relevant combo box so people can just select the names from there.

Obviously, I'd give them an add/edit form so they could add the cities, states or countries they needed... The way I looked at it, if the guy entering the city screwed it up, at least it was screwed up consistantly. A quick edit from the edit screen, which called a background update querey to update existing records using that name... and voilà!

Don't get me wrong, I'm sure this is all pretty amatuerish. I'm actually starting to really look forward to some of these upcoming classes.
You're pretty much describing how I'd do it. Use a foreign key to define a one-to-many relationship between city and houses (and another for province to cities, etc.). The city column would have a unique key so it can't be entered more than once, and voila, the city is only spelled in one place in the entire database. Your edit screen can add a new city, or change the spelling in the one record where the city exists.

As a developer, if I were handed a database diagram which has that relationship, I'd instantly think dropdown box, since the database says that city has to come from this list. If the city name were a column in the same table as the house, I'd think text box, since the database is saying that the city name could be anything.

(...and then I'd build an OLAP cube to hold the aggregated statistics for all the MLS properties in Canada... muahahahaha... holy dorkfest)

A key thing with database development is to put all your data validation in the database, whenever possible. It actually makes the developer's life easier, because they will get errors while building and testing which may have otherwise gone unnoticed. As well, the schema will help them in understanding the logical data structure.
Mad Mel is offline   Reply With Quote
Old 03-23-2010, 06:05 PM   #45
MelBridgeman
Lifetime Suspension
 
Join Date: Mar 2007
Location: Calgary
Exp:
Default

Let's discuss this further...because from my point of view, there is no way to stop this from happening.

1. I get a daily CSV file from this MLS provider, each field is the fully qualified description. I do not get any lookup tables from the MLS provider. Just a decoded file.

2. The importer reads the file and builds the lookup tables based on the data. So for the city lookups, it will compare the city field in the CSV against whats in the lookups, based on the description, if its not in there it has to add it and assigns the generated lookup id to the listing, if its located it assigns the pk to that listing....So if cities are spelt wrong, there is nothing i can do initially.

Since the lookup id of the correct spelling of Fort Mac isn't gonna change, i can update the importer to fix the lookup data and the listings once the fresh data is in.

But on initial design and without having any control over the MLS app that realtors use to enter this data, there is nothing I can do about it.

Last edited by MelBridgeman; 03-23-2010 at 06:10 PM.
MelBridgeman is offline   Reply With Quote
Old 03-23-2010, 06:16 PM   #46
MelBridgeman
Lifetime Suspension
 
Join Date: Mar 2007
Location: Calgary
Exp:
Default

I should note that every time there is another wrong spelling for fort mac, i still have up update the importer with the correct lookup ids that need to be changed to the correct fort mac lookup id

Last edited by MelBridgeman; 03-23-2010 at 07:40 PM.
MelBridgeman is offline   Reply With Quote
Old 03-23-2010, 06:53 PM   #47
Mad Mel
First Line Centre
 
Mad Mel's Avatar
 
Join Date: Mar 2009
Location: Brisbane, Australia
Exp:
Default

Yeah, you're kinda victimized by a poor data design that you've inherited.

So are you using a two-level lookup?

Table 1 has the raw data. It joins to Table 2, which has all the unique spellings of all the cities (misspellings and correct spellings). It joins to Table 3, which has only the unique correct spellings.

Table 2 would contain all the historic misspellings and correct spellings, so you don't have to re-add all the misspellings every time. Then when you get a new csv, you'd only have to add the new misspellings to Table 2.

In your output view, just use the correct city name from Table 3.

Maybe that's what you're already doing, though?

Last edited by Mad Mel; 03-23-2010 at 06:55 PM.
Mad Mel is offline   Reply With Quote
Old 03-23-2010, 07:34 PM   #48
FanIn80
GOAT!
 
FanIn80's Avatar
 
Join Date: Jun 2006
Exp:
Default

Maybe there's something you can do with the postal codes? The first three characters from each postal code will tell you which city it's from... all you need to do is get a list of postal codes for each province and dump them in a table along with the proper city name. Then you can do your lookup by postal code prefix?

Mind you, if they typo a postal code, you're really screwed. It's no longer just a misspelled city name, it's a completely different city name.
FanIn80 is offline   Reply With Quote
Old 03-23-2010, 07:44 PM   #49
photon
The new goggles also do nothing.
 
photon's Avatar
 
Join Date: Oct 2001
Location: Calgary
Exp:
Default

Quote:
Originally Posted by MelBridgeman View Post
Let's discuss this further...because from my point of view, there is no way to stop this from happening.

1. I get a daily CSV file from this MLS provider, each field is the fully qualified description. I do not get any lookup tables from the MLS provider. Just a decoded file.

2. The importer reads the file and builds the lookup tables based on the data. So for the city lookups, it will compare the city field in the CSV against whats in the lookups, based on the description, if its not in there it has to add it and assigns the generated lookup id to the listing, if its located it assigns the pk to that listing....So if cities are spelt wrong, there is nothing i can do initially.

Since the lookup id of the correct spelling of Fort Mac isn't gonna change, i can update the importer to fix the lookup data and the listings once the fresh data is in.

But on initial design and without having any control over the MLS app that realtors use to enter this data, there is nothing I can do about it.
Well there's not nothing, I bet that if I entered in every single one of your incorrect searches into Google, it would ask me "Do you mean Fort McMurray?" or "Showing Results for Fort McMurray".

What about something using phonemes?

Anyway, this is a good excuse to post this link to a great article on Google, some insights on how they do what they do.

http://www.wired.com/magazine/2010/0...ogle_algorithm
__________________
Uncertainty is an uncomfortable position.
But certainty is an absurd one.
photon is offline   Reply With Quote
The Following 3 Users Say Thank You to photon For This Useful Post:
Old 03-23-2010, 07:45 PM   #50
FanIn80
GOAT!
 
FanIn80's Avatar
 
Join Date: Jun 2006
Exp:
Default

Awesome. I love finding good articles about how they do things.
FanIn80 is offline   Reply With Quote
Old 03-23-2010, 07:52 PM   #51
photon
The new goggles also do nothing.
 
photon's Avatar
 
Join Date: Oct 2001
Location: Calgary
Exp:
Default

I don't think I've ever met a DBA that is a Mac aficionado...
__________________
Uncertainty is an uncomfortable position.
But certainty is an absurd one.
photon is offline   Reply With Quote
Old 03-23-2010, 07:53 PM   #52
FanIn80
GOAT!
 
FanIn80's Avatar
 
Join Date: Jun 2006
Exp:
Default

Quote:
Originally Posted by photon View Post
I don't think I've ever met a DBA that is a Mac aficionado...
Rut ro. Are they Linux guys?
FanIn80 is offline   Reply With Quote
Old 03-23-2010, 07:55 PM   #53
MelBridgeman
Lifetime Suspension
 
Join Date: Mar 2007
Location: Calgary
Exp:
Default

I should say that by saying "nothing I can do", really means, there is nothing I can do given the time and money it might take to something, I need approval on everything I do, so right now, anything more elaborate is not in the cards.
MelBridgeman is offline   Reply With Quote
Old 03-23-2010, 07:57 PM   #54
Mad Mel
First Line Centre
 
Mad Mel's Avatar
 
Join Date: Mar 2009
Location: Brisbane, Australia
Exp:
Default

Quote:
Originally Posted by photon View Post
Well there's not nothing, I bet that if I entered in every single one of your incorrect searches into Google, it would ask me "Do you mean Fort McMurray?" or "Showing Results for Fort McMurray".
SQL Server Integration Services has a 'Fuzzy Lookup' transformation for this purpose, that a person could plug into their data loading package. I think MB is using MySQL or something, perhaps it has something similar?
Mad Mel is offline   Reply With Quote
Old 03-23-2010, 08:20 PM   #55
photon
The new goggles also do nothing.
 
photon's Avatar
 
Join Date: Oct 2001
Location: Calgary
Exp:
Default

Quote:
Originally Posted by FanIn80 View Post
Rut ro. Are they Linux guys?
Mostly, or just use whatever the company gives them which is Windows and complain about that.

There probably is though, if all the old beardy Java developers have been replaced by young hip soul patchy developers with Groovy and Grails and Macbooks, then DBAs are probably the same.
__________________
Uncertainty is an uncomfortable position.
But certainty is an absurd one.
photon is offline   Reply With Quote
Old 03-23-2010, 08:25 PM   #56
FanIn80
GOAT!
 
FanIn80's Avatar
 
Join Date: Jun 2006
Exp:
Default

Haha nice.
FanIn80 is offline   Reply With Quote
Old 03-23-2010, 08:43 PM   #57
Bobblehead
Franchise Player
 
Bobblehead's Avatar
 
Join Date: Jul 2005
Location: in your blind spot.
Exp:
Default

Speaking with an Oracle rep, they were saying 90+% was *NIX or Solaris, especially with the big companies. They really only get into the Windows world when they get into the SMB segment.
__________________
"The problem with any ideology is that it gives the answer before you look at the evidence."
—Bill Clinton
"The greatest obstacle to discovery is not ignorance--it is the illusion of knowledge."
—Daniel J. Boorstin, historian, former Librarian of Congress
"But the Senator, while insisting he was not intoxicated, could not explain his nudity"
—WKRP in Cincinatti
Bobblehead is offline   Reply With Quote
Old 03-23-2010, 11:15 PM   #58
JustAnotherGuy
Lifetime Suspension
 
Join Date: Oct 2008
Location: Calgary
Exp:
Default

Quote:
Originally Posted by Mad Mel View Post
.......

A key thing with database development is to put all your data validation in the database, whenever possible. It actually makes the developer's life easier, because they will get errors while building and testing which may have otherwise gone unnoticed. As well, the schema will help them in understanding the logical data structure.
There are far too many factors to make a blanket statement like that. The database should have constraints to ensure valid data is captured AND the application should have validations to ensure the captured data is clean.

What validations are done where is dependent on the deployment of the system. You don't want to round trip from the client to the server and have an simple validation error that the application could have done.
JustAnotherGuy is offline   Reply With Quote
Old 03-23-2010, 11:20 PM   #59
Hack&Lube
Atomic Nerd
 
Join Date: Jul 2004
Location: Calgary
Exp:
Default

Where are you taking these courses BTW?
Hack&Lube is offline   Reply With Quote
Old 03-23-2010, 11:25 PM   #60
Mad Mel
First Line Centre
 
Mad Mel's Avatar
 
Join Date: Mar 2009
Location: Brisbane, Australia
Exp:
Default

Fair enough, though I never said that you can't validate on both ends. In principle, relying on the application to validate data is not a good way to go, and I stand by that. However there's certainly no reason that the application developer can't choose to validate data before it ever gets to the database, for any reason including performance or explicit, helpful error handling.
Mad Mel is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -6. The time now is 10:02 AM.

Calgary Flames
2024-25




Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright Calgarypuck 2021 | See Our Privacy Policy