49 thoughts on “Rapleaf’s Web: How You Are Profiled on the Web”

  1. Regarding “how we can improve and better protect consumers” … how about providing a public interface where individuals can verify whether or not the information which RapLeaf has collected is in fact correct ?

    1. Thank you for your follow-up, Irina.

      Yes, upon re-reading the article, I followed the link under the text “You can opt out …”.

      I appreciate the ability to investigate the status of information held on file. My only comment after doing so has to do with the fact that registering my e-mail address with Rapleaf was required prior to viewing the information.

      While I recognize that this requirement protects the holder of the e-mail account from public disclosure of Rapleaf’s “dossier” on a particular account, it also provides Rapleaf with an IP address which it can add to the information on file.

      What is the policy of Rapleaf with regards to the disclosure or use of IP addresses ?

      1. @IrinaIssakova:

        Thank you for your reply.

        In the article which you referenced, Mr. Hoffman writes:

        “IP addresses should be thought of as privileged information. From our tests, IP addresses perfectly identify about 30% of U.S. households. That means that from IP address, a site can know your exact address.”

        This raises a contradiction: you can not run ‘tests’ which determine that “IP addresses perfectly identify about 30% of U.S. households” *without* collecting and storing IP addresses, yes ?!

    2. You actually can’t see most of the info Rapleaf has, even if you sign up. I did. Rapleaf only shows generic stuff, like that I like “social networking” — not the fact that it actually has my ID for Facebook, Flickr, Twitter, Livejournal, etc.

    3. @Irina_Issakova: While most would appreciate the open tone in your comment regarding your service’s ability to view and opt-out of the data you store, its extremely ironic that *registration* (i.e. submitting data to you) is required to do this.

      As good as you make your company seem, its core is still rooted in an evil consumer data space.

      Perhaps you are one of the better ones, but any type of data gathering, especially ones that leverage and exploit publicly available content and social profiles for the purposes of selling insights or information to other companies – you simply can’t position yourself with any “good” sentiment. When one reaches your scale, you are extremely “evil” because more insights can be extracted about more people; data = power.

      Behavioral and profile-based targeting or even something as simple as email verification *requires* business practices which are questionable and that most consumers would oppose to. Default opt-in is morally controversial activity in today’s technologically connected world. In this case, you are what you do unfortunately.

  2. We sign NDAs with our customers so we cannot release their names

    You only sign NDAs if you’re trying to keep something quiet. What are they trying to hide? Are these companies ashamed? They should be!

    I’ve just registered with rapleaf and looked at my profile for one of my email addresses. All I can say is… wow. They have a lot of data on me. I’ve opted out…

    Great journalism, Om. Thanks.

    1. Pete

      Thanks for the kind words. And I agree, this whole NDA this is a hairball and a lot more mess is hiding behind it. I am working on follow up post as well.

    2. Most companies require NDAs with vendors to prevent their competitors from learning about the details or existence of the vendor relationship.

      Signing an NDA in a customer/vendor relationship is not nearly as nefarious as you imply.

      Enterprise software companies sign NDAs with customers all the time, as a matter of routine.

  3. I don’t see much of a concern with what Rapleaf does as long as they don’t violate any laws or anything. The information they gather is out there for the taking. Facebook is probably a much bigger privacy concern because they have real data. And there are other websites such as the http://www.dirtyphonebook.com that are probably even worse threats to personal privacy. Google also seems to be dumping their “Do no evil” pledge and should be closely watched.

    @Irina, I’m glad to see you mention that the suggestion about managing your personal information is a possibility. I didn’t know that about your service. Thanks.

  4. This article misses the point.

    The social networks have more value and scale faster if everyone is willing to make their information public. They also make this information available to search engines, because it helps them scale even faster. Search for someone’s name on google and you will see their linkedin page. Enter an email in facebook and you will see the public information on this person. This information is public.

    The problem isn’t Rapleaf and other companies that scrape publicly available data on the internet. In fact, google’s whole business model is based on scraping content that does not belong to them.

    The problem is that this information is public and it was in the social network’s best interest to make it public and people are more interested in connected with other people and do not grasp the implications of making their information public.

  5. Om, good stuff. Rapleaf is run by a Russian born entrepreneur and has a decidedly Russian business plan. I have issues with their lack of transparency and feel like its the next Offerpal-like story built around Facebook. The data that important here is Facebook not the other networks and Facebook needs to police and authenticate this in building its brand.

  6. Just as an experiment, I’ve decided to completely opt out (I think) of Rapleaf’s database. I’m not so concerned about privacy, security, etc. I just want to see if I notice what will happen with display ads, etc.

      1. I’m actually not sure. Will I be be served relevant display ads? Will I not be retargeted anymore? (I find retargeted ads annoying). Can anyone point to a downside of opting out of Rapleaf altogether?

  7. Pete Warden (of Facebook crawling fame) built a tool a while ago that pulls up what companies can see when they search for you via email. It’s no frills but pretty scary.

    http://web.mailana.com/labs/findbyemail/

    The code can be found at http://github.com/petewarden/findbyemail

    He also wrote about Rapleaf as well as tracking earlier this year.

    http://petewarden.typepad.com/searchbrowser/2010/03/all-the-cool-kids-are-using-the-rapleaf-api.html
    http://petewarden.typepad.com/searchbrowser/2010/03/the-unknown-marketing-databases-that-know-everything-about-you.html

    Interesting reads, I think.

  8. om, this is an awesome post. thanks for bringing the attention to rapleaf. for sure there are many others.

    i use multiple emails. in the past i had the urge to use an email with different names. what had hit me at the time was that rapleaf had my real name in their database with this specific email address -> I wrote about it in 2007: http://andreinchile.com/2007/09/03/the-internet-and-privacy-get-yourself-removed-from-rapleaf-quickly/

    in 2007 rapleaf gathered a lot of email addresses by allwoing users to upload all their addressbook to find out info about their friends. i got quite a few emails which told me that someone had checked me out on rapleaf…

    Eurpeans would be protected from this kind of behaviour through the data protection act but the internet knows no borders…

    please om. stay on this story and let us know what you find out.
    thanks

  9. Regarding the NDA’s: I think you’re blowing it out of proportion. My company has to sign NDA’s with our partners all the time, just so we can be told who their clients are and how their back-end databases are built so we can set up the integration. It’s not designed to “hide malicious activity” – it’s designed to protect a company’s Intellectual Property and client names, and to make them feel comfortable opening the kimono a bit more.

    On the whole, I think this is being blown out of proportion. This is public information that you choose to put onto Facebook or LinkedIn, and if Google/Bing can index it, why can’t any one else? (though, yes, if they are by-passing someone like LinkedIn’s terms of use, then that’s an issue)

    It seems like the goal of a lot of these companies who use this data is to reduce the amount of spam and crap that you get and make sure you don’t see the same generic “obama refinance your mortgage” banners, but rather something that you might actually be interested in.

    If advertising revenue is the only way that a lot of web services are going to make money, and consumers want to continue to enjoy these services for free, then wouldn’t you at least want better ads that you’re more likely to be interested in? Which in turn will increase the clicks on the ads, increasing revenues, increasing features/service, etc.

    Yes, there should be transparency and users should be able to easily opt-out, but it seems like a fair trade off to keep things free/low-cost.

    And today you have the ability to opt-out of this via network-specific privacy controls, as well as via Rapleaf’s site. And the registration on Rapleaf to opt-out makes sense. Otherwise companies could simply ping these pages with an email address in the guise of “opting out” and that would immediately verify that it’s a valid address in the database. I agree that if you don’t know who Rapleaf is, then how do you know to opt-out. That’s something to solve.

    1. On NDAs: You’re exactly right, they’re designed to protect Intellectual Property. But in this case the NDA is being used to obscure which companies have supplied my data (or the other way around) to Rapleaf.

      An opt-outs, as you point out, an opt-out is all very well. But until this week I’d never heard of Rapleaf. And it turns out they have a ton of data on me! Users cannot be expected to opt-out of services they’ve never heard of — it’s more than “something to solve” — it’s a problem.

  10. Pingback: Privacy | toni.org
  11. And what if I register and start using an email address previously owned and used by someone else who then let it expire? This happens all the time with services like Hotmail, Yahoo, etc. where expired email addresses are made available again after a certain amount of time.

  12. Making a query for an email address certainly is “passing data,” it’s how Woodward & Bernstein got information about Watergate from Deep Throat.

  13. @Man Ching– while Irina may be flattered by your assumption that she founded and/or runs Rapleaf, she is far from being the mastermind of RapLeaf’s business plan. I also find your comment as to her background highly inapproproate and irrelevant to this discussion.

  14. Whoa there, tiger.

    99% of what’s called behavioral targeting includes no personally identifiable information (PII), e.g. names and email addresses. Those of us who have worked in that field don’t think it’s reasonable or legal to casually and quasi-secretly cross that line as Rapleaf did for its first few years.

    And, most of what is actually called behavioral targeting in the US is legal in Europe. The EU and a couple of its members have an expanded definition of PII, the implementation of which are simply best-practices data processing here as well. The sloppy 80% of online marketing companies get tripped up by those laws; the 20% do the extra 2 days of work up front and have no problem.

    1. Unfortunately, there are many algorithms and companies whom specialize in taking non-personally identifiable information (PII) and use big data to tie it back to real people with great accuracy (above 90%).

      You may have a profile which you think is anonymous, yet the scale of data along with smart systems could still extract who you are.

      On the public data side, its not just companies like Rapleaf, a simple Google search finds lots of info on “Irina Issakova” whom works at Rapleaf and commented above:

      http://www.linkedin.com/in/irinaissakova

      I’m sure she doesn’t mind her information being displayed publicly here, as she submitted it to a public service and considering the business she is in.

      1. It’s very easy to do as you say, but you’re missing my point. I’ve run one of those systems. You can buy the data, etc. (we didn’t!), but the only companies that do tie anonymous data back to an identifiable human are run by people who know that they are engaging in unethical practices. Often, it’s tough for rank-and-file employees to get the picture, but the execs know.

        The number of execs willing to do this sort of thing, and also willing to risk their shareholders investment on not getting caught, is a severe minority. The issue is, as usual, bringing the bad apples into the light. Existing FTC code is fine to manage the problem; it just needs to be used.

  15. Awesome & insightful post, Om.

    There was a thread on HN where someone asked Rapportive guys how they can scale to so many users and pay Rapleaf 5 cents per lookup. That question went unanswered. I can’t imagine that Rapportive is paying 5 cents per lookup as they’d go bankrupt in no time at that rate, as their core service is free. Its largely nothing more than showing social data based on Rapleaf lookup.

    Whether or not these companies are passing data to Rapleaf, it seems likely that they are not paying much or nothing at all..which suggests Rapleaf has something to gain from that.

    Looking forward to the next post on this.

  16. For startups that accept new registrations with email verification, it seems that RapLeaf provides a legitimate way to prevent, or at least decrease, the number of spammy, fake email registrations. This is a valuable service to startups that want to make registration simple but don’t want the fraud that simplicity sometimes brings along with it.

    What about the inevitable tradeoff? Yes, startups are helping to build RapLeaf’s database. But, this seems somewhat analogous to how Equifax works. Equifax’s customers make their database better, too.

    What matters is not that RapLeaf has a quid pro quo relationship with its customers regarding data, but rather what RapLeaf does with the data. There are good uses (reducing fraud) and potentially bad uses. The bad uses should be dealt with by regulators while the good uses should continue to be available and utilized.

  17. I am happy that you were able to get real responses from these companies about how this information is used. You point out several important points about how data is used, even if it is not shared.

    Thank you

Leave a Reply to scowley Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.