Cryptographers are getting closer to enabling completely private Internet searches


original version Of this story appeared in quanta magazine,

we all know We should be careful about the details we share online, but the information we seek can also be revealing. Find driving directions, and estimating our location becomes much easier. Check the password in the repository of compromised data, and we risk leaking it ourselves.

These situations give rise to an important question in cryptography: How can you pull information from a public database without revealing anything about what you accessed? This is the equivalent of checking a book out of a library without the librarian knowing which book it is.

Creating a strategy that solves this problem — known as private information recovery — “is a very useful building block in many privacy-preserving applications,” said David Wu, a cryptographer at the University of Texas at Austin. Since the 1990s, researchers have dodged this question, improving strategies for accessing databases privately. A major goal, while still impossible with large databases, is the equivalent of a private Google search, where you can anonymously sift through piles of data without doing any heavy computational lifting.

Now, three researchers have created a long-awaited version of private information retrieval and expanded it to become a more general privacy strategy. This work, which received the Best Paper Award at the Annual Symposium on Theory of Computing in June 2023, overcomes a major theoretical hurdle in the path to truly personal discovery.

,[This is] Something in cryptography that I think we all wanted but we didn’t believe existed,” said Vinod Vaikunthan, a cryptographer at the Massachusetts Institute of Technology who was not involved in the paper. “This is a historic result.”

The problem of private database access emerged in the 1990s. At first, researchers believed the only solution was to scan the entire database during each search, which would be like having a librarian scan each shelf before returning with your book. After all, if the search misses a section, the librarian will know that your book is not in that part of the library.

This approach works quite well on a small scale, but as the database grows, the time required to scan it increases at least proportionally. As you read from large databases—and the Internet is very large—this process becomes prohibitively inefficient.

In the early 2000s, researchers began to suspect that they could avoid the full-scan hurdle by “preprocessing” the database. Broadly speaking, this would mean encoding the entire database as a particular structure, so that the server can answer a query by reading only a small part of that structure. Theoretically enough careful preprocessing could mean that a server hosting the information goes through the process only once, allowing all future users to access the information privately without much effort. .

For Daniel Wyche, a cryptographer at Northeastern University and co-author of the new paper, it seemed too good to be true. Around 2011 he started trying to prove that such a plan was impossible. “I was convinced there was no way it could be done,” he said.