Categories
Googleplex

reCAPTCHA definition and history

What does a CAPTCHA do?

Humans can read the distorted text in CAPTCHA challenges* but current computer programs cannot.

A CAPTCHA is a program that protects websites against bots by generating and grading tests that humans can pass but current computer programs cannot.

What does CAPTCHA mean?

CAPTCHA is an acronym for Completely Automated Public Turing Test To Tell Computers and Humans Apart. It was coined in 2000 by Carnegie Mellon University computer science research staff who invented CAPTCHA originally.

What is the difference between CAPTCHA and reCAPTCHA?

This is how the reCAPTCHA Project explains the difference:

ReCAPTCHA helps prevent automated abuse of your site (such as comment spam or bogus registrations) by using a CAPTCHA to ensure that only humans perform certain actions.

Generally a CAPTCHA is a single word, whereas a ReCAPTCHA is two words. The reCAPTCHA project page explains this in greater detail. There are research papers, in *.pdf format available for download on the Google ReCAPTCHA website.

Google purchased CAPTCHA in 2009 and describes usage and further background on reCAPTCHA FAQs:

ReCAPTCHA is a free CAPTCHA service that helps to digitize books, newspapers and old-time radio shows.

ReCAPTCHA is free

While free to use, including the API, be aware that ReCAPTCHA is not open source software.

Other uses

ReCAPTCHA is best known for historic text digitization and spam filtering, which is an information security measure.

Answers to reCAPTCHA challenges are used to digitize textual documents… a combination of multiple OCR programs, probabilistic language models, and the answers from millions of humans on the internet, reCAPTCHA is able to achieve over 99.5% transcription accuracy at the word level….

OCR is an acronym. It means Optical Character Recognition. Compare the accuracy of standard OCR versus reCAPTCHA transcriptions of a medium quality scanned document on the reCAPTCHA digitization accuracy website. See some humorous reCAPTCHA examples from the official Google reCAPTCHA blog. Google announced an audio version of reCAPTCHA in 2009.

MailHide is another application, where potential for spam is reduced by requiring a reCAPTCHA challenge in order to disclose an otherwise partially obscured email address. More details are available in my post about MailHide from last month.

Recent developments

Recent research in the area of computer security led to some surprising discoveries about CAPTCHA and spam. Initially, it appeared that the CAPTCHA challenge had been defeated on a large scale, but localized very regionally. That was not true though. Human interaction of an unanticipated sort was still required to evade the CAPTCHA, on each and every spam comment and email that got through.

*Work continues on the original CAPTCHA project.

Categories
Googleplex

Chrome Developer Tutorial

This is an excellent tutorial for learning how to use the Developer Tools in the Google Chrome Browser. Hyper-link is to the Official Google Groups Site for the Open Source Chromium Project, not to a third-party provider!

Check your webpage! Find errors! Reduce Page Load Times!

Learn how to use EVERYTHING: Elements, resources, scripts,timelines, profiles, storage and the console. Once you learn the how to use the developer tools, you own the keys to the kingdom.

There are some salient points, for which I selected four articles to more completely explain. Note that the article order does not reflect on Zemanta, as the choices were not based on priority of relevance to web development. They were idly selected in this order according to my own idiosyncratic whims.

Click to Play

Initially, I thought this might be a fun game. I was wrong. It refers to in-line advertising links and videos in general.

Sand boxing

This is important!  Recall that Google Chrome browser has Adobe’s Flash application as a built-in feature. These features are called Chrome Extensions. Chrome and Adobe offer a “sandbox” for Chrome’s Flash component. A sandbox is a circumscribed “safe” area where a developer can do testing, without mishap e.g. crashing the browser. This is also useful for non-developers who might want to “contain” their Flash usage due to a temporary concern about security.

Web Design

This provides more information about the Chrome Extensions* which is the subject of sand boxes, see above.

Browser Cache

This will reveal the location of the elusive browser cache on your computer. It is found easily for other web browsers, in the Options menu for Internet Explorer is the first example that comes to mind. The cache location is not nearly as obvious for Chrome. I need to read the article in fact, as I have no idea where Chrome is storing my browser cache.

Related Articles: Selected by me from the Zemanta suggestions:

* There are many Chrome Extensions available. Each adds to the memory usage by the browser. If you load up too many, you can definitely hamper Chrome’s delightful responsiveness. That requires a certain effort.

Extensions will be a topic for a separate post.

Categories
Googleplex

Quality-of-Life in the Chrome O/S Cloud

Google Web Toolkit (“GWT”) is a productivity tool for developers. It is a

development toolkit for building and optimizing complex browser-based applications. GWT is used by many products at Google, including Google AdWords and Orkut. It’s open source, completely free, and used by thousands of developers [worldwide].

What programming language would be the most accessible for Google Chrome O/S apps development?

These are the existing constraints:

  1. Android apps are coded in Java.
  2. Chrome browser apps are JavaScript.
  3. A Java programmer can use a web toolkit to “translate” Java into JavaScript.

However, it will be more difficult to go in the other direction. That is, a PHP programmer can create JavaScript apps for Chrome browser. But Android apps require knowledge of Java. This is the reverse of item 3 (above), and is much more challenging.
Perhaps there is a unified language for both scripting as well as programming the core functionality of the app?

GWT Logo

Google Web Toolkit does that!

GWT certainly lets you write Java apps, then compile them into JavaScript. And it might get even better!

How? With a consolidated toolkit, based on GWT. Such a consolidated toolkit could be used to write an Android app that also works on Chrome O/S as a web app, without the need for coding in Java, only in JavaScript

Categories
Googleplex

Google users pressed into service in war against spam

Google (GOOG) recently made an official announcement offering a Personal Blocklist extension for Chrome browser users. I am weighted down with far too many Chrome browser extensions already, so I haven’t tested this one. Technology press coverage of the news slightly surprised me:

Google (GOOG) is concluding that if people are so up in arms about its declining search results, then it will let the masses get to work in helping refine its search technology…

Spam Protection Extension for Chrome browser

While amusing (I’ve supplemented my TechCrunch reading with GigaOM lately), it was more in line with what I expect from The Onion. Yet it is correct. The size and growth of the spam problem warrants this reaction from the press, as well as the public and many businesses. All express frustration with spam and electronic detritus.

Google is addressing spam with a two-pronged initiative, it seems to me. The Google War on Content Farms  of a few weeks earlier was directed at particularly spammy e-commerce merchants and services. The Personal Blocklist browser extension is the second part, and directed at e-commerce consumers and users in general.

Basic search

Search!

In a worst case scenario, this can be viewed as a sign that the internet will soon become almost unusable due to clutter from impenetrable volumes of advertisements and duplication of once original but now outdated content. That is the most generalized definition of spam. As a matter of quality control Google DOES need to provide meaningful resultswith a minimum of spam, to Google Search 2.0. users.

What can be done?

Is Google evil?

Is it Google’s fault? Is Google greedy and betraying the pubic’s best interests? No, not particularly.

Google is a publicly traded company, a business with stockholders. It is not a public utility. Google employees and Google operations are not funded by the taxpayers of any nation. It is very easy to forget that. The model of free online services is wonderful, and benefits everyone, everywhere, particularly in countries where what is considered a nominal cost in the U.S.A. would be prohibitively expensive. Much of the U.S. and global economy, as well as the public in general, are dependent upon free Google services to some degree. This is analogous to physical infrastructure. It is digital infrastructure.

Infrastructure is usually part of the public sector

In order to fund the model of free internet search, and free Google products, Google sells online advertising. And so the World Wide Web’s spam problem reduces in some part, though not entirely, to the principal agent problem. Moral hazard. Conflict of interest.

Avoidance of moral hazard is a major benefit of having a public sector, and government. When the public sector functions as it should, it reduces biased behavior due to profit-seeking and other motives.

The dilemma for Google as a company

Google needs the advertising revenue provided by AdSense customers (some of whom are the Content Farmers). That is why Google must offer a quality product to the public. Not because the public are Google customers. Google search is free of charge. While it may be unethical to sell a poor-quality product, there is no law against offering crummy goods and services free of charge. That happens all the time. No one wants something that is useless or gives much less value than an alternative provider.

Good corporate citizenship is a consideration, but only a minor one. Google must provide a quality product because the public’s use of free Google products drives revenue from customers. Google is obligated to:

  • Customers. Primary customers are advertisers and revenue-generating businesses, for-profit and otherwise
  • Employees. The people whose paycheck it provides for going to work every day

Remember though that the motivation for these obligations is that they may in turn give value to shareholders in the company itself.

The war against the Content Farmers is dangerous for Google. The Google anti-spam efforts must be targeted enough to cut spam and increase search user satisfaction while not alienating the source of funding that sustains Google and allows the company to offer services at all.

Categories
Googleplex

IPv6 Day is on the way

IPv6 Day is scheduled for 12 June 2011. Most internet service providers (ISP’s), major technology companies and of course, Google, Yahoo and Microsoft will be participating. The complete list of participants is available from the Internet Society (ISOC).

The ISOC sees to the overall well-being of the global internet. This is a very important task. The internet is the framework for most of our digital infrastructure.

What is IPv6?

IP is the abbreviation for “Internet protocol”. No, not “Intellectual property”! At least, not in this context.  Internet Protocol version 4 (IPv4) is the current standard. Internet Protocol version 6 (IPv6) will be the new internet protocol.

IPv6 Day is only a test day

It is not a permanent transition to the new standard. IPv6 Day is a 24-hour period during which participants will run using IPv6 instead of IPv4. Complete transition to an exclusively IPv6 internet is still in the future. There is a very real urgency though. The most pressing concern is IP address availability. The IPv4 address space was exhausted, completely depleted, several months ago.

This animated IPv6-themed Google logo was featured during  the 2010 IPv6 Implementors Conference. Unlike the usual Google *.jpeg logo, this image is formatted as a *.gif file.

Google IPv6 logo that wiggles for transition from IPv4

Special Google IPv6 logo using GIF format

Google worked with internet organizations on the IPv6 transition for many years:

Since 2008, Google has hosted conferences focused on addressing and sharing IPv6 implementation experience, designs, and associated research.

—  Google IPv6 Implementors Conferences

Categories
Googleplex

Google photos for businesses

Business Photos from Google are now available to businesses with listings in Google Places.

Google Places listings are seen by any Google Maps user. This feature should help small businesses who want to reach more local customers. Google photographers will take interior shots of businesses, which is distinctly different from the exterior imagery ordinarily seen on Google Maps.

…sign up for a photo shoot by Google trusted photographers. The images will appear on your business’ Place page, and as 360-degree imagery using Street View technology.

Availability

The Business Photos feature is being rolled out gradually based on geographic area. The comprehensive FAQ page includes locations and timelines.

If interested in participation, a short application must be completed.

Categories
Googleplex

Google Plus One

After a long wait, Google Plus finally arrived in April 2011. It is part of Google’s incremental approach to going “social”.

Google +

Various red herrings were the subject of much intense debate while the internet and social media world awaited a new social product from Google. Google Circles was one such false lead. (In fact, it was a discontinued beta product dating back to 2006 or 2007). There were other equally incorrect conjectures, some of which I wrote about, as I am as curious as other Google-watchers.

Social Family Identity

Now that Google Plus is here, it is uncertain where it will fit into the overall Google social product family, so to speak. The short-lived Google Hotpot was merged into Google Places after a brief run of only six months. The fate of Google Buzz is not questioned, yet it is confusing for webmasters to know which product to offer, Buzz or Plus.

My minor Google Research observation

A peculiar situation, no more reliable than any oracle or augur, about the fate of Google Buzz came to light for me today. I was browsing the Google Research blog and website. I noticed that the contact choices for Google Research were “subscribe to blog”, “follow us on Twitter” and “follow us on Google Buzz”.  However, the URL for the Google Buzz option returned an Error 404, Not Found.

Shortly after, I took a quick glance at the Official Google Research account on Twitter. I noted that a Google Buzz profile was the URL contact provided for Google Research. However, this was the very same Google Buzz URL as the one listed on the Google Research blog site. And of course, it also returned an Error 404 Page Not Found.

This is merely an observation. It might not have any significance whatsoever with respect to the status of Google Buzz. I have been wrong before!

UPDATE: July 2011

I received a very courteous message from Google Research about the error in the Buzz URL on both blog and Twitter profile. Google Research promptly updated the Google Buzz URL to a valid one in both locations. (I later invited Google Research to join Google+, but was informed that organizations were not yet allowed to have Google+ profiles).

My earlier concerns about the imminent demise of Google Buzz based on Google Research activity were not appropriate at this time.

Categories
Googleplex

Google Translation Story Continues

Last month, developers whose applications and websites depended on the Google Translate API and the underlying Google machine translation were shocked by an unexpected announcement.

Google Says Translate and other APIs WILL be deprecated

Google APIs are deprecated all the time. Usually they are replaced with comparable services or APIs.

But that morning was not like anything else. That morning became cruel and sad when the world heard the news. The linguists and webmasters were taken aback, shocked and stuttered in disbelief. The world learnt on May 26, 2011 that Google is no longer going to support its free machine translator also known as Google Translate

via Lackuna.com: Slaughtering Machine Translators – Who Is Going To Replace Google? 

The Translate API documentation on Google Code makes the situation very clear:

The Google Translate API has been officially deprecated as of May 26, 2011. Due to the substantial economic burden caused by extensive abuse, the number of requests you may make per day will be limited and the API will be shut off completely on December 1, 2011.

Google suggests the Translate Element as an alternative to the API for website translation and similar needs.

Welcome to the Indic web

Deprecation of the Google Translate API does not mean an end to human usage of Google Translate.

This becomes very clear with this June 21 announcement on the official Google blog, Google Translate welcomes you to the Indic web. Google Translate announced support of five languages, in alpha* status: Bengali, Gujarati, Kannada, Tamil and Telugu.  According to the post,

In India and Bangladesh alone, more than 500 million people speak these five languages.

Special fonts need to be downloaded to use Google Translate with these Indic languages. The post has links to get access to these fonts, free of charge.

It is not clear whether these five alpha languages will be included in the deprecated Translate API before it is taken offline permanently on December 1, 2011.

* Google Translate introduced nearly a dozen alpha languages since 2009. At present, Google Translate supports 63 languages.

Categories
Googleplex

Prediction API

The recent release of the Google Prediction API Version 1.2 seemed oddly, well, magnanimous to me! Given the investment of intellectual capital and resources, I am surprised that Google would be so generous.  Allowing access to the Prediction API means that Google is giving access to its in-house machine learning algorithms to external users.

1939 Ford pick-up truck will not likely use the Google Prediction API though other Ford products will

The official Google Code blog post, Every app a smart app, dated 27 April 2011, suggested many possible uses for the Prediction API. Some of the more interesting included:

The last item on the list has the potential, but not certainty, of causing serious privacy concerns. I’m guessing that customer feedback based on structured data is another potential use for the API.

I noticed that Ford Motor Company has plans for the Prediction API, specifically for commuters driving electric vehicles (EV). Apparently, there is a fair amount of “EV anxiety” due to limitation on range of travel. The Prediction API could be used to mitigate those concerns. AutoBlog is an online publication for automobile enthusiasts. It featured a great slide show demonstrating how Ford intends to make use of the Google Prediction API.

The Prediction API is available on Google Code. This is not the first release of the Prediction API. I’m uncertain whether versions before 1.2 were restricted in some way. (Google often grants API access to developers initially, and later, after ironing out any bugs or unexpected problems, opens the product to the public.)

Do be aware that a Google Storage account is required for access. Visit the Google API Console to get started.

Categories
Googleplex

Try a VeriSign SSL Certificate gratis

Network and data security has really been on my mind lately!

I visited the Symantec and VeriSign websites the other day. I’m not sure if this is a true “limited time special offer” or an ongoing promotional deal that I never noticed until now. Two sorts of SSL (Secure Socket Layer encryption) certificates are available from VeriSign.

Secure Socket Layer protection

30-day SSL test-drive

One is the standard type that is desirable for websites that are accepting payment data or collecting other sensitive personal information from users. VeriSign refers to this as a Production Certificate. It includes use of the distinctive VeriSign Trust Seal, for use on SSL websites.

The other type is an SSL Test Certificate. Applications developers who want to confirm that SSL encryption is functional in a test (pre-production ONLY) environment should select this. It doesn’t include display of the Trust Seal, because it isn’t intended for use with applications on the public web. Both are available for free, for a 30-day trial period.

Try a VeriSign Certificate* today!

There may be superior alternatives to VeriSign SSL authentication. Regardless of vendor choice or implementation, it won’t hurt to contemplate data security, given the almost daily news reports of DDoS, DoS and other attacks. Or disclosure of yet another 0-day vulnerability or data breach.

* No, I’m not a paid endorser. I hoped someone might find it helpful and informative. Me, for example!

UPDATE: July 30, 2011

I just noticed that VeriSign has another offer; a 60-day free trial for a VeriSign Seal. See the VeriSign website for more information.

VeriSign offers both SSL and non-SSL products

What is the difference between the Trust Seal and the Secured Seal?

Like the VeriSign Secured Seal, the VeriSign Trust Seal shows that a site is authenticated by the high standards of VeriSign… The VeriSign Trust Seal is free with the purchase of any VeriSign® SSL Certificate. It can also be purchased separately for web sites that do not require SSL for securing online transactions. The VeriSign Trust Seal provides a cost-effective way to establish trust on your site without installing an SSL Certificate.

Emphasis is mine. However, VeriSign prominently displays this advisory on the Trust Seal FAQ page:

If your Web site uses SSL, you must use VeriSign SSL in order to display the VeriSign Trust Seal.

I’m uncertain, but suspect that the 30-day Trust Seal deal includes SSL certification, which is actually the VeriSign Secured Seal. The 60-day special probably does not. In other words, it offers the Trust Seal but not the SSL certificate, and is suitable only for non-SSL websites..