This is a guest post by Audrey Boguchwal of Samasource, Quid’s longtime business partner in our company data acquisition pipeline. It was originally published here to showcase how Samasource’s web research and data cleaning services help collect data for Quid to build our natural language processing-powered data platform.
How does Quid create business intelligence?
Here at Quid, we are working to ingest the world’s collective intelligence. Our goal? Help business decision makers quickly gain the insights they need to make more informed decisions. Quid’s product is a platform that searches, analyzes and visualizes data to deliver key insights. Customers bring questions such as, “What does the current landscape look like for financial technology?” For each question, Quid creates a custom visual interactive map from raw business data: documents and articles.
The map illustrates the network of relationships between documents: the closer together the nodes, the more similarities exist between the data. With Quid, users can quickly find topic similarities, uncover hidden relationships, and draw insights that might otherwise take a researcher hundreds of hours.
To create its visual maps, the Quid platform pulls information from three premium data sets: Companies, News and Blogs, and Patents (as well as custom uploaded data). Samasource helps Quid ensure that our Company data set contains high quality data to generate insightful results.
How does Quid source data for its algorithm?
Thousands of new companies are formed around the world every day, so Quid is always sourcing new company descriptions. While these descriptions, from business data sources like CapitalIQ and Crunchbase, are accurate, they aren’t sufficiently rich for Quid’s NLP (Natural Language Processing) algorithms.
Some common challenges with “off-the-shelf” Company descriptions:
- Company descriptions are too standard, use common jargon, and don’t contain enough specific information about the company’s business and product
- “SuperCo is positioned to revolutionize delivery of value to its customers and is headquartered in BestCity, USA”
- Company descriptions are too brief, especially for new companies
- “AwesomeCo works in technology space”
- Descriptions become obsolete after a company moves to a different space – which can happen often in today’s fast-moving world
How Quid uses Samasource to improve data inputs
Samasource workers help Quid by conducting online research to add more information to Company descriptions. Additional information makes the descriptions richer and more specific so that Quid’s algorithm can find true connections between the companies and generate better visual maps.
Quid uses Samasource’s API to send company descriptions from business data sources directly to the Hub software. The API helps Quid work at scale to keep up with the influx of new companies. It also allows the Quid team to prioritize work for Samasource.
The Hub’s task queue enables a dedicated, trained team of Samasource workers to review and update company descriptions simultaneously. Each worker is presented with one company and description at a time. Workers conduct research on the company’s website, trustworthy news media and any other relevant sources. Workers edit existing company descriptions to ensure that they’re accurate, detailed, and up-to-date. Workers also write descriptions for companies that are too new to have one. A Samasource quality analyst inspects completed descriptions and then data is sent back to Quid over the API.
What does high data quality look like?
Quality data is important for Quid, but how do you define quality for a company description? When the project began, Samasource and Quid teams conducted multiple rounds of review and training to structure guidelines. The teams landed on reasonable minimum word counts and jargon- and marketing-free language. So what’s jargon and what’s not? The team developed a test: “If you read a value proposition in a company description, and McDonald’s could also say the same thing about their business, then don’t include it. It’s not specific enough.” With clear guidelines in place, Samasource workers have been able to consistently produce quality results for Quid.
Quid knows that quality output depends not only on clear guidelines, but also on the data workers are sent. A rich company description depends on that company’s media and web presence. In cases when the minimum word count for a description cannot be met, Samasource workers add a reason code to let the Quid team know why, for example, “company is too new” or “out of business”.
Quid put in place a QA review process to maintain expected quality levels, with metrics reported into the the engineering monitoring system based on Datadog, and scoring that uses Machine Learning techniques.
One way that Samasource brings value to the table
Samasource workers’ research and writing results in more detailed company descriptions that enable Quid’s NLP algorithms to produce better networks, leading to high-value insights for Quid’s users. Quality data in, better results out. Data cleaning is a big part of what every machine learning and data science team must do to get accurate results from their model. Engineers can build automation around good data – but they first need humans to clean the data.
What do the Samasource workers who conduct research for Quid have to say about the project? Quality reviewers Iyvinne, Benta and Christine enjoy learning about new companies, inventions and technologies around the globe. Workers Joseph, Rosebellah and Gladys say that the project has helped improve their business skills such as reading speed, knowledge of specific business terms, and vocabulary. Team lead Richard sums it up well, saying that the project has sharpened his problem solving abilities and analytical skills. He’s had to put them to good use when helping the team dig up information on complex deals like reverse mergers.
Quid engages with Samasource because this detailed task of research and writing is optimally suited for humans. Samasource workers get research done reliably and cost effectively to help Quid focus on its core business of deep document analysis.
Algorithms work best when fed with clean data
Quid ingests the world’s collective intelligence to help businesses make better decisions.
With training data expertly cleaned by Samasource’s web research team, the Quid platform can better surface valuable insights to customers using the Company data set.
Interested in helping us solve awesome problems? If so, then head over to our careers page!