As RootsTech nears, FamilySearch TechTips will feature some of the 2012 presenters and the content of their presentations. We begin this series with information about Dallan Quass and his 3 RootsTech 2012 presentations that are geared towards an intermediate level software developer.
An Open-source Place-finder for Genealogy
This class will be held Thursday, Feb 2 from 3:00 – 4:00 pm MST.
When comparing people for potential duplicates or creating programs with mapping capabilities, identifying and standardizing places is important. Genealogy has a special challenge in this area, because places change names and jurisdictions over time. This talk describes an open-source place-finder: a program and database to map genealogist-recorded place texts to standardized places.
Genealogists record places in all sorts of ways. Sometimes they follow the recommended standard, but often they leave out jurisdictional levels or get intermediate levels wrong. When comparing people for potential duplicates or creating programs with mapping capabilities, identifying and standardizing places is important. Genealogy has a special challenge in this area, because places change names and jurisdictions over time. This talk describes a newly-released open-source place-finder: a program and database to map genealogist-recorded place texts to standardized places with geo-positioning that you can incorporate into your own software projects. For more information on this topic, please see Dallan’s syllabus.
A Robust Open-source GEDCOM Parser
This class will be held Friday, Feb 3 from 11:00 am – 12:00 pm MST.
One of the biggest hurdles in developing genealogy applications is writing a GEDCOM parser. Parsing GEDCOMs is a lot like parsing HTML in the late 1990s. This talk describes an open-source GEDCOM parser that is robust in the face of extensions and errors.
We present an open-source GEDCOM parser. The parser parses GEDCOM files into a de facto object model, which is able to represent nearly all of the tag sequences found in real-world GEDCOM files. The object model includes common custom tags; other tags are represented as extensions. The object model has a JSON representation, and the toolkit includes a GEDCOM exporter. This makes it possible for anyone to read a GEDCOM file, manipulate its contents, save it to JSON, and export it back to a GEDCOM file, without loss of information for the vast majority of GEDCOMs. For more information on this topic, see Dallan’s syllabus.
An Open-source Similar-name Finder
This class will be held Friday, Feb 3 from 3:00 – 4:00 pm MST.
This talk describes an open-source similar-name finder that returns a list of similar names to include in searches whenever a particular name is searched. It is a better name-matcher than Soundex.
One of the biggest problems facing beginning genealogists is figuring out all the different ways their ancestors spelled their names. Names weren’t spelled in a standardized manner until fairly recently, and illiterate ancestors often had their names listed under different spellings. Soundex often misses close variants and includes other variants that aren’t very close. This talk describes an open-source similar-name finder that gives a 28% reduction in “false negatives” (names that should be searched together but are not) in comparison to Soundex and the similar-name algorithms developed by two large genealogy organizations. For more information on this topic, see Dallan’s syllabus.
More about Dallan
Dallan Quass is the developer behind WeRelate.org, the world’s largest public genealogy wiki. He has a PhD in computer science from Stanford University and BS and MS degrees from BYU. Prior to WeRelate, he was CTO for FamilySearch. Earlier, he co-founded WhizBang! Labs, FlipDog.com which was acquired by Monster.com, and Junglee which was acquired by Amazon.com. He has written 40 publications which have been cited over 3,000 times according to academic.research.microsoft.com.