Updated: Faced with declining revenues and increasingly dismal prospects, some mainstream media outlets are adopting questionable tactics, specifically dead-end web pages stuffed with outbound links and pay-per-click ads. A liberally funded LA startup is only too quick to help them. The story starts with San Francisco-based sex writer Violet Blue. She used to be a columnist for the San Francisco Chronicle, the SF daily with ever-declining circulation.
Recently, while writing a column, she did a search through the archives of SFGate.com, the online presence of the Chron. She discovered that the web site was “copying” and “distorting” her column archives. (Here’s the link— Warning: Not Safe for Work) Here’s how she describes what she saw:
The column had been stripped of all links, and divided across several pages. My bio was missing, as were all the comments. Freakishly, all the commas were gone. And the URL had been changed. The address was comprised of words; to my horror the URL had been keyworded to say “ashamed porn star” — the exact opposite of the article’s content. There is a much bigger story here. It’s all in what’s going on with archive duplication and the nation’s old media newspapers online. I think that the work done to the duped content is done for the purpose of SEO (Search Engine Optimization). The idea here seems to be stripping content, duplicating it, make SEO’d content that is a dead end for readers, and drive up results with cost per click ads.
The San Francisco Chronicle, it seems, like the Los Angeles Times, is using the technology of an LA-based startup, Perfect Market, which has raised $20 million from Trinity Ventures, Rustic Canyon Ventures and others. Tim Oren, a venture capitalist at The Pacifica Fund, on his blog, Due Diligence, points out that while there’s nothing illegal about what the newspapers are doing, it does border on scraping. Typically, spammers scrape web sites, then set up shadow blogs and fill them with pay-per-click ads. As Oren writes:
The keyword and ad-stuffed dead end pages apparently produced by Perfect Markets’s technology are isomorphic, from a search company’s point of view, to those created by more questionable tactics such as scraping. The intent is the same: to spam the index. This is the behavior that routinely gets questionable sites shoved to Google’s back pages, or banished altogether. One has to wonder just how long this type of abuse will be tolerated, simply because it’s being practiced by a recognized media outlet.
I couldn’t agree more. Nor could I help but notice the irony, considering how quick the mainstream media is to lament the traffic-stealer that is Google. It wouldn’t surprise me if more newspapers adopted these kind of strategies.
Update #1: A reader after some sleuthing points out that Perfect Market may also be working with LA Times, Baltimore Sun, Orlando Sentinal, South Florida Sun Sentinal, Hartford Courant, Allentown Morning Call, Virginia Daily Press and the New York Daily News.
Update #2: Julie Schoenfeld, Perfect Market CEO Responds:
Perfect Market has been working over the past year to increase revenue for newspapers through search and social media and we have had wonderful success. We are actively working with our partners to delight our customers and users with innovative new content experiences. In the meantime, there are factual errors being perpetuated about our services that we would be remiss to leave unaddressed.
Here are the facts:
* Perfect Market serves up professionally produced news articles on the major search engines and only works with high-quality publishers. Our pages contain highly professional editorial content, representing decades of careful work by journalists and writers who work for publishers to produce quality content. We are held to a high standard by our publishers to preserve the integrity and quality of content we publish online, and we hold ourselves to a high standard.
* We are not ‘scraping’ ‘spamming’ ‘keyword-stuffing’ or ‘duplicating content’. While spammers attempt to surface pages with little meaningful content. Perfect Market simply manages the search experience for publishers. We provide contextual navigation to relevant related content and topics so the user can browse the publishers vast content library rather than creating dead ends. Content is not unreadable. Quite the contrary, it is out in the open and accessible to all, and often times, more accessible than ever before.
* We know how traffic from search engines and properly targeted CPC ads can generate big revenues. We are bringing those learning’s to high quality publishers so they can more fully participate in the vibrant internet ecosystem.
The whole SEO saga has seen high drama from the very first chapter. This is just a new chapter of this long story. I am pretty sure that in the next 5-10 years, there is going to be another interesting chapter. Somebody is going to sue Google, and a dubiously tech-unsavvy court is going to declare that Google and other search engines should publish the exact algorithm by which they rank search results so that all websites “have an equal opportunity to compete on the same level for search engine ranking”.
Om, I did a bit of the digging for Violet on this (she’s my partner). To your point about other newspapers using Perfect Market, I discovered the following hostnames tied to IP addresses used by Perfect Market’s systems:
64.211.62.100 – articles.latimes.com
64.211.62.105 – articles.baltimoresun.com
64.211.62.110 – articles.orlandosentinel.com
64.211.62.115 – articles.sun-sentinel.com
64.211.62.120 – articles.courant.com
64.211.62.125 – articles.mcall.com
64.211.62.130 – articles.dailypress.com
64.211.62.135 – articles.sfgate.com
64.211.62.140 – articles.nydailynews.com — offline
I obtained this by finding the IP address used by SFGate’s Perfect Market subsite and then performing a reverse-lookup of neighboring IP addresses.
It looks like LA Times, Baltimore Sun, Orlando Sentinal, South Florida Sun Sentinal, Hartford Courant, Allentown Morning Call, Virginia Daily Press and the New York Daily News are all using this or looking to.
Interestingly, many of these papers are Tribune outlets – Tribune just participated in a round of funding for Perfect Market and I’m wondering if this is something they intend to roll out more widely across their portfolio.
I should probably blog about this, but thought you and your readers would be interested in this additional information.
Ben
Thanks for the update after your investigation(s). I think it is seriously strange to see these companies not exactly be conscious of their own brand and stature. By the way, not all of these papers are Tribune outlets.
Google is getting spammed by too many angles to handle. They only deal with the worst of the worst. Spamming is now part of the web.
How far the mighty have fallen.
Big media is the new black hat SEO.
The problem is Google will basically let any large company do anything they want without punishment. They play by a different set of rules.
Gary
Interesting point of view. Why do you say that? I mean if this is clearly a breach, why would Google let them get away. I would love to know more about your reasons.
The recent expose(s) seobook.com did about mahal0.com has led many SEO-type folks to believe that a relationship with high ranking Googlers can get you preferential treatment.
Hi, there:
This is a really interesting topic that helped blow up a Windy Citizen discussion thread last week following an interesting post from a former ChicagoNow blogger. We invite you to check it out, because there are many examples and anecdotes relevant to what you’re talking about here.
Thanks!
I am amazed that Google tolerates this kind of behaviour but on the other hand it clearly deminstrates that print media does not understand how this game is played and why for the past 10-15 years they failed on the net.
Most people will stoop to any level to pick up a penny –as long as it’s laying there in the open, with no obvious owner, and no awful onus attached to it.
Someone build me a spambot-app that lets me spam every website that offends me. In moments the offender will be calling his lawyer to drag me into court to explain why I did it.
“It easy, your honor. The SOB stole my time, my attention, and directly or indirectly, cost me money -to make money he has no intention of sharing with me.”
If such a retaliatory spambot-app existed, I would pay the app store a handsome price for that bot. They would pay a handsome royalty to the app developer. Legitimate business interests would lobby for legislation to “ban the avenging spambot app.” Odd that they wouldn’t think to stop the offending practice first. “Oh, what tangled webs we weave….” -dh
Newspapers as cyberpunks. Love it!
But for some reason, I can’t get that old Far Side image out of my head; the one where the gang of dinosaurs are huddled together, smoking cigarettes.
Death is rarely pretty.
Have any of you looked at SFgate archive pages?
They are not scrapping or duplicating anything. The articles are their own, looks to me like they are just moving them and republishing them in an archive template. They might be changing the title of the page based on some auto process but that is not spamming. Just often writers and bloggers have vague headlines which are fine in print but on the web you need to use titles people actually search for. So Dog Bites Man not Canine Snacks on Human.
The pages have less ads than their main pages, jsut htey are mostly Google Adsense ones.
They might be stripping out some repeat text like biogs and such as that would make the pages less focused in terms of search. But that is a good practice, search engines want to be able to find relevant articles of high quality.
There seems to be no change in the actual copy so they are not keyword stuffing.
I would say the worst offenders for spamming are blogger with every SEO plugin swditched on and generating thousands of emptyish tag pages.
This is a non-story.
“There seems to be no change in the actual copy so they are not keyword stuffing.”
Have you looked at Violet Blue’s actual complaint, which kicked this whole discussion off? I don’t think you have.
Still no explanation from “James” or Julie regarding why titles were changed (to completely different meanings), punctuation stripped or links removed.
The link removal is the part that is more bizarre (and gray).
Titles were in fact never changed. Commas were missing on a very small number of articles. That has been corrected. Some articles used a nonstandard character code for the comma that was incorrectly transcoded in content feeds.
We are also now including editorial links. Originally, we had disabled links because on evergreen or archive stories many links were broken and we decided it would be a better user experience without so many dead ends/404 errors.
After further consideration and discussions with our customers we made the decision to restore the links.
We still do not think it is an ideal user experience to have so many broken links. However we have not implemented a comprehensive solution to that yet.
James I don’t know who you are but your comment is right on!
Several of us from Perfect Market are at SXSW if you are here and want to talk to us, feel free and give us a shout on twitter. We are @perfectmarket
If your concern is a few old links breaking, then surely linking to the Archive.org version of a page is likely better than just removing ALL links. Further it would not be a technical challenge to at least grab the status code and leave the links that still work, while flagging 404s and such for review.
Removing all links from the articles just lowers their utility. And it is not a good longterm business practice if they want people to keep linking at their articles.
They took the commas out of her article! Call the police!
scandalous! Some people will go to great depths to make money, ripping off peoples work and profitting from other people in the process is just evil!
Disappointing to hear that major media outlets are resorting to these type of tactics. I suppose they are trying to keep content fresh and ensure that there is enough content available.
i hate all this seo rubbish, hope it will not matter after the instant google is properly inserted.