Earlier, I posted about San Francisco-based Internet information aggregator Rapleaf, a service that collects, sorts and repackages data about many of us who spend an inordinate amount of time on the Internet. I started poking around and discovered many startups that are using data from Rapleaf, but it’s not just startups. Just take a look at this article on Rapleaf in Fast Company from last year:
By accessing its database of 378,968,953 consumer email profiles, banks, retailers, and anti-fraud firms (all of which it counts among its clients) Rapleaf can quickly confirm legitimate customers and weed out scammers, cutting verification costs and improving the user experience. “Companies spend as much as $100 getting customers to their site. The goal is to filter out the bad people and keep as many good people as possible,” (Joel) Jewitt (Rapleaf’s VP of Business Development) says. “If a customer’s email address is attached to three or four social networking sites with 300 friends, the email likely isn’t fake and the retailer can put that person in the ‘good’ pile.”
One of our readers pointed out that because Rapleaf is sending data to these companies, which may be caching your information, there’s more information leaking out about you on the web. Opting out of Rapleaf’s service isn’t going to do you any good. Let’s put it bluntly: For better or worse, the genie is out of the bottle.
How Rapleaf Works
To better understand how, exactly, Rapleaf works, I did some investigating. On a basic level, Rapleaf is like a credit card company’s database. When you’re at a store and the cashier slides your credit card through, the store checks your card information against the credit card company’s database to make sure your card hasn’t expired and you have enough credit.
Rapleaf’s database contains email addresses. Say an airline offers a discount coupon, as long as you provide your email. When you sign up for the coupon, the airline looks up your email address in Rapleaf’s database; Rapleaf confirms the email is valid by checking it against your profile in its database; and the airline knows it can send you its email newsletter.
When I contacted Rapleaf, they said the company has built a database by crawling the web, looking for connections and building profiles based on their own technology. “Like Google, we crawl publicly available data on the web – as long as robots.txt allows search engines like us to crawl (we stop crawling if people disallow search engines),” CEO Auren Hoffman emailed. He added:
Rapleaf is working hard to protect consumers. We are a data company that, like 99 percent of data companies, is opt-out (rather than opt-in). But we are a white-hat data company who helps companies safely provide a more personalized experience to their customers. We try really hard to protect consumers (see) – we’ve thought a lot about consumer protection and are proud of everything we are doing. However, we are open to ideas on how we can improve and I encourage your readers to email me at firstname.lastname@example.org with ideas on how we can improve and better protect consumers. While we cannot commit to implementing any idea from your readers, we can commit that we will consider all thought-out suggestions.
The company argues what it does is no different from various ad networks, and that its policies are more consumer-friendly. You can opt out of Rapleaf by visiting this location, Hoffman said. Nevertheless, Rapleaf’s services are clearly much in demand, based on this response from CEO Hoffman:
Today we help hundreds of top retailers, hotels, advertising agencies, large brands, tech startups, educational organizations, and nonprofits personalize their customers’ experiences. (We sign NDAs with our customers so we cannot release their names.)
Think of Rapleaf as the provider of the FICO score about an email address. That email address comes with Facebook ID, Flickr ID (s yhoo), Twitter account information and other social details. For a marketer, or even someone trying to hit you up for business, this is pretty relevant data, for it allows them to target a customer and connect them socially. In another scenario, you can buy an email list of a million addresses for $1000, check them against Rapleaf and end up with about 10,000 emails worth targeting. That’s a pretty good deal.
A Good Email ID Is Worth Money
In order for Rapleaf to be successful, it needs to keep growing its database of good email addresses, which is why it’s giving startups like Facebook game and social CRM companies liberal access to its APIs. When a social CRM company, such as Rapportive, plugs into your Gmail (s goog) account, it confirms to Rapleaf that your email address is valid. Since the social CRMs create profiles of the people who email you, the services confirm to Rapleaf that your friends’ addresses are valid, too. Technically, no data is exchanged, but the sheer quantity of look-ups is enough to beef up Rapleaf’s database.
Think of it this way: Companies like Rapportive, by making simple queries, are becoming the sources of the best and highest quality emails/IDs that Rapleaf has ever obtained. I think this is the crux of the problem. Here’s a question I sent to Rapleaf and the answer I received (emphasis mine).
Does Rapportive (and others like them, such as Gist) pay for the service? If yes, how much? What happens to the queries that originate from Rapportive? Say email email@example.com. Does that data get stored in your databases?
Unfortunately we’re not able to go into details about specific relationships because of our confidentiality agreements, but all of our customers pay us for our service. We do have a free API (up to 1000 queries per month) that many companies use — but companies need to pay for Rapleaf for queries above that. We only allow companies to learn more about their existing customers (and we have never given out email addresses) and when they query their customers’ email, we return the most updated information Rapleaf has associated with that email. If this is a new email we have not seen before, it may be cached to provide better user experience in the future or it can be removed via opt-out.
Given that Rapleaf’s core competency is its ability to take email addresses, map them with data on the web and build a profile, I find the argument that data is cached for better user experience hard to swallow. With nearly a billion email addresses in its database, any look-up helps Rapleaf cull out the best emails from the giant morass of addresses. There are at least two companies I spoke to who have declined to work with Rapleaf and refused its offer of free data, mostly because, in their opinion, they found the workflow unsavory, to put it mildly.
Rapleaf’s Startup Web
Regardless, here is a list of Internet startups that have access to data from Rapleaf. Clearly it is incomplete, and, for some of these companies, it is not clear if they send data back to Rapleaf (I’ve noted the companies that confirmed that they only look up data). I am going to update this post with more comments as I get them.
- Rapportive. The CEO has confirmed that the company doesn’t pass any data back and forth.
- eTacts. They say they are not passing information back to Rapleaf.
- Gist. The CTO confirmed the company isn’t passing any information back to Rapleaf.
- Flowtown. Co-founder Ethan Bloch left a comment indicating Flowtown doesn’t pass any information back to Rapleaf.
- SocialShield. Arad Rostampour denied passing any data back to Rapleaf.
As I said earlier, even if the companies aren’t passing any data, every time they do an email-based look-up against Rapleaf’s database, they are essentially helping make Rapleaf’s database more powerful.
Casting the Social Web
Verifying emails is one thing. But today, there is a lot more valid social information about demographics, interests, location, etc. available that a company like Rapleaf could use to fill out its profiles. I’m as concerned about startups using Rapleaf’s API as I am about how the company continues to mine data from huge data-rich social services such as LinkedIn. LinkedIn data is ending up on Rapleaf, and from there, it’s appearing on other services such as Flowtown. When I contacted LinkedIn, its spokesperson sent the following response:
As we’ve always said, our user data belongs to our users. It is provided by them and unless they have restricted it, is available on our site. We don’t share personally identifiable information with third parties without user consent. We also have teams that help protect our members’ professional profiles from scraping, spamming and any other activity that violates our terms of service. We don’t have any business relationship with Rapleaf.
However, LinkedIn data ends up at Rapleaf and, via Rapleaf, at other services through scraping of the publicly available data. Some people with knowledge of the subject believe that alternative tactics are being used to get around the API limitations of services such as LinkedIn. (If you know more, please get in touch with me.)
To be clear, I don’t have old-fashioned notions about privacy on the Internet. I know the realities of today’s Internet life. In order to enjoy the convenience of using web-based services, one has to make some sacrifices, and living socially online will eventually lead to an erosion of privacy. However, what I find egregious is how the information is surreptitiously collected all over the web, then aggregated to be sold, without us having any control or ability to look into that data. Sure we can opt out, but only if we know that we’re being profiled. (Ironically, you have to register to opt-out.)
I don’t want to blame only Rapleaf — ad networks are doing this as well, giving it cutesy names like behavioral targeting. U.S. Reps. Edward Markey (D-Mass.) and Joe Barton (R-Texas) recently sent a letter to Mark Zuckerberg and Facebook, questioning him about privacy breaches at the social network. In August 2010, these same congressmen asked for information from various web services on cookies and how they use them. Maybe they should consider looking at these data-collectors as well. Perhaps they will come to the conclusion that this industry needs some kind of oversight.
Related content from GigaOM Pro (sub req’d):
49 thoughts on “Rapleaf’s Web: How You Are Profiled on the Web”
Regarding “how we can improve and better protect consumers” … how about providing a public interface where individuals can verify whether or not the information which RapLeaf has collected is in fact correct ?
My name is Irina Issakova and I work in Marketing at Rapleaf. Thanks for the comment! We actually have a way for people to see the information and manage it. You can go here to check it out:
We continue to make improvements and appreciate any suggestions. You can email our CEO Auren Hoffman directly at firstname.lastname@example.org.
My name is Irina Issakova and I work in Marketing at Rapleaf. Thanks for your suggestion! We actually have a page where people can see the information and manage it. Check it out here:
We continue to make improvements to this page and definitely appreciate suggestions. You can email Rapleaf CEO Auren Hoffman at email@example.com with ideas.
Thank you for your follow-up, Irina.
Yes, upon re-reading the article, I followed the link under the text “You can opt out …”.
I appreciate the ability to investigate the status of information held on file. My only comment after doing so has to do with the fact that registering my e-mail address with Rapleaf was required prior to viewing the information.
While I recognize that this requirement protects the holder of the e-mail account from public disclosure of Rapleaf’s “dossier” on a particular account, it also provides Rapleaf with an IP address which it can add to the information on file.
What is the policy of Rapleaf with regards to the disclosure or use of IP addresses ?
That’s a great question about IP addresses. We do not collect, store or manage IP addresses.
Our CEO Auren Hoffman wrote about this in July:
Thank you for your reply.
In the article which you referenced, Mr. Hoffman writes:
“IP addresses should be thought of as privileged information. From our tests, IP addresses perfectly identify about 30% of U.S. households. That means that from IP address, a site can know your exact address.”
This raises a contradiction: you can not run ‘tests’ which determine that “IP addresses perfectly identify about 30% of U.S. households” *without* collecting and storing IP addresses, yes ?!
You actually can’t see most of the info Rapleaf has, even if you sign up. I did. Rapleaf only shows generic stuff, like that I like “social networking” — not the fact that it actually has my ID for Facebook, Flickr, Twitter, Livejournal, etc.
@Irina_Issakova: While most would appreciate the open tone in your comment regarding your service’s ability to view and opt-out of the data you store, its extremely ironic that *registration* (i.e. submitting data to you) is required to do this.
As good as you make your company seem, its core is still rooted in an evil consumer data space.
Perhaps you are one of the better ones, but any type of data gathering, especially ones that leverage and exploit publicly available content and social profiles for the purposes of selling insights or information to other companies – you simply can’t position yourself with any “good” sentiment. When one reaches your scale, you are extremely “evil” because more insights can be extracted about more people; data = power.
Behavioral and profile-based targeting or even something as simple as email verification *requires* business practices which are questionable and that most consumers would oppose to. Default opt-in is morally controversial activity in today’s technologically connected world. In this case, you are what you do unfortunately.
You only sign NDAs if you’re trying to keep something quiet. What are they trying to hide? Are these companies ashamed? They should be!
I’ve just registered with rapleaf and looked at my profile for one of my email addresses. All I can say is… wow. They have a lot of data on me. I’ve opted out…
Great journalism, Om. Thanks.
Thanks for the kind words. And I agree, this whole NDA this is a hairball and a lot more mess is hiding behind it. I am working on follow up post as well.
Most companies require NDAs with vendors to prevent their competitors from learning about the details or existence of the vendor relationship.
Signing an NDA in a customer/vendor relationship is not nearly as nefarious as you imply.
Enterprise software companies sign NDAs with customers all the time, as a matter of routine.
I don’t see much of a concern with what Rapleaf does as long as they don’t violate any laws or anything. The information they gather is out there for the taking. Facebook is probably a much bigger privacy concern because they have real data. And there are other websites such as the http://www.dirtyphonebook.com that are probably even worse threats to personal privacy. Google also seems to be dumping their “Do no evil” pledge and should be closely watched.
@Irina, I’m glad to see you mention that the suggestion about managing your personal information is a possibility. I didn’t know that about your service. Thanks.
This article misses the point.
The social networks have more value and scale faster if everyone is willing to make their information public. They also make this information available to search engines, because it helps them scale even faster. Search for someone’s name on google and you will see their linkedin page. Enter an email in facebook and you will see the public information on this person. This information is public.
The problem isn’t Rapleaf and other companies that scrape publicly available data on the internet. In fact, google’s whole business model is based on scraping content that does not belong to them.
The problem is that this information is public and it was in the social network’s best interest to make it public and people are more interested in connected with other people and do not grasp the implications of making their information public.
Om, good stuff. Rapleaf is run by a Russian born entrepreneur and has a decidedly Russian business plan. I have issues with their lack of transparency and feel like its the next Offerpal-like story built around Facebook. The data that important here is Facebook not the other networks and Facebook needs to police and authenticate this in building its brand.
Just as an experiment, I’ve decided to completely opt out (I think) of Rapleaf’s database. I’m not so concerned about privacy, security, etc. I just want to see if I notice what will happen with display ads, etc.
So how will you be able to tell a difference?
I’m actually not sure. Will I be be served relevant display ads? Will I not be retargeted anymore? (I find retargeted ads annoying). Can anyone point to a downside of opting out of Rapleaf altogether?
Pete Warden (of Facebook crawling fame) built a tool a while ago that pulls up what companies can see when they search for you via email. It’s no frills but pretty scary.
The code can be found at http://github.com/petewarden/findbyemail
He also wrote about Rapleaf as well as tracking earlier this year.
Interesting reads, I think.
Indeed. I have been following his stuff for a long time and it is definitely interesting. Thanks for sharing.
om, this is an awesome post. thanks for bringing the attention to rapleaf. for sure there are many others.
i use multiple emails. in the past i had the urge to use an email with different names. what had hit me at the time was that rapleaf had my real name in their database with this specific email address -> I wrote about it in 2007: http://andreinchile.com/2007/09/03/the-internet-and-privacy-get-yourself-removed-from-rapleaf-quickly/
in 2007 rapleaf gathered a lot of email addresses by allwoing users to upload all their addressbook to find out info about their friends. i got quite a few emails which told me that someone had checked me out on rapleaf…
Eurpeans would be protected from this kind of behaviour through the data protection act but the internet knows no borders…
please om. stay on this story and let us know what you find out.
Regarding the NDA’s: I think you’re blowing it out of proportion. My company has to sign NDA’s with our partners all the time, just so we can be told who their clients are and how their back-end databases are built so we can set up the integration. It’s not designed to “hide malicious activity” – it’s designed to protect a company’s Intellectual Property and client names, and to make them feel comfortable opening the kimono a bit more.
It seems like the goal of a lot of these companies who use this data is to reduce the amount of spam and crap that you get and make sure you don’t see the same generic “obama refinance your mortgage” banners, but rather something that you might actually be interested in.
If advertising revenue is the only way that a lot of web services are going to make money, and consumers want to continue to enjoy these services for free, then wouldn’t you at least want better ads that you’re more likely to be interested in? Which in turn will increase the clicks on the ads, increasing revenues, increasing features/service, etc.
Yes, there should be transparency and users should be able to easily opt-out, but it seems like a fair trade off to keep things free/low-cost.
And today you have the ability to opt-out of this via network-specific privacy controls, as well as via Rapleaf’s site. And the registration on Rapleaf to opt-out makes sense. Otherwise companies could simply ping these pages with an email address in the guise of “opting out” and that would immediately verify that it’s a valid address in the database. I agree that if you don’t know who Rapleaf is, then how do you know to opt-out. That’s something to solve.
On NDAs: You’re exactly right, they’re designed to protect Intellectual Property. But in this case the NDA is being used to obscure which companies have supplied my data (or the other way around) to Rapleaf.
An opt-outs, as you point out, an opt-out is all very well. But until this week I’d never heard of Rapleaf. And it turns out they have a ton of data on me! Users cannot be expected to opt-out of services they’ve never heard of — it’s more than “something to solve” — it’s a problem.
How long would Rapleaf leave up the opt out option if someone were to buy a bunch of email lists automate the process of submitting every one of them through the opt out form?
Thanks for covering this, Om.
And what if I register and start using an email address previously owned and used by someone else who then let it expire? This happens all the time with services like Hotmail, Yahoo, etc. where expired email addresses are made available again after a certain amount of time.
Om: I agree this is important. Keep at it. Thanks.
Making a query for an email address certainly is “passing data,” it’s how Woodward & Bernstein got information about Watergate from Deep Throat.
@Man Ching– while Irina may be flattered by your assumption that she founded and/or runs Rapleaf, she is far from being the mastermind of RapLeaf’s business plan. I also find your comment as to her background highly inapproproate and irrelevant to this discussion.
Whoa there, tiger.
99% of what’s called behavioral targeting includes no personally identifiable information (PII), e.g. names and email addresses. Those of us who have worked in that field don’t think it’s reasonable or legal to casually and quasi-secretly cross that line as Rapleaf did for its first few years.
And, most of what is actually called behavioral targeting in the US is legal in Europe. The EU and a couple of its members have an expanded definition of PII, the implementation of which are simply best-practices data processing here as well. The sloppy 80% of online marketing companies get tripped up by those laws; the 20% do the extra 2 days of work up front and have no problem.
Unfortunately, there are many algorithms and companies whom specialize in taking non-personally identifiable information (PII) and use big data to tie it back to real people with great accuracy (above 90%).
You may have a profile which you think is anonymous, yet the scale of data along with smart systems could still extract who you are.
On the public data side, its not just companies like Rapleaf, a simple Google search finds lots of info on “Irina Issakova” whom works at Rapleaf and commented above:
I’m sure she doesn’t mind her information being displayed publicly here, as she submitted it to a public service and considering the business she is in.
It’s very easy to do as you say, but you’re missing my point. I’ve run one of those systems. You can buy the data, etc. (we didn’t!), but the only companies that do tie anonymous data back to an identifiable human are run by people who know that they are engaging in unethical practices. Often, it’s tough for rank-and-file employees to get the picture, but the execs know.
The number of execs willing to do this sort of thing, and also willing to risk their shareholders investment on not getting caught, is a severe minority. The issue is, as usual, bringing the bad apples into the light. Existing FTC code is fine to manage the problem; it just needs to be used.
Awesome & insightful post, Om.
There was a thread on HN where someone asked Rapportive guys how they can scale to so many users and pay Rapleaf 5 cents per lookup. That question went unanswered. I can’t imagine that Rapportive is paying 5 cents per lookup as they’d go bankrupt in no time at that rate, as their core service is free. Its largely nothing more than showing social data based on Rapleaf lookup.
Whether or not these companies are passing data to Rapleaf, it seems likely that they are not paying much or nothing at all..which suggests Rapleaf has something to gain from that.
Looking forward to the next post on this.
For startups that accept new registrations with email verification, it seems that RapLeaf provides a legitimate way to prevent, or at least decrease, the number of spammy, fake email registrations. This is a valuable service to startups that want to make registration simple but don’t want the fraud that simplicity sometimes brings along with it.
What about the inevitable tradeoff? Yes, startups are helping to build RapLeaf’s database. But, this seems somewhat analogous to how Equifax works. Equifax’s customers make their database better, too.
What matters is not that RapLeaf has a quid pro quo relationship with its customers regarding data, but rather what RapLeaf does with the data. There are good uses (reducing fraud) and potentially bad uses. The bad uses should be dealt with by regulators while the good uses should continue to be available and utilized.
I am happy that you were able to get real responses from these companies about how this information is used. You point out several important points about how data is used, even if it is not shared.
Rapleaf’s Web: How You Are Profiled on the Web: Tech News «: http://t.co/TGOCEkWj