Por Paola Cantarini
Jacob Appel – BIO
Jacob Appel works in the field of AI auditing as part of a company called ORCA, founded by O’Neill. The company is specialized in Risk Consulting and Algorithmic Auditing.
Essa entrevista foi realizada originalmente em inglês no dia 13.12.2023.
A entrevista foi revisada em sua versão inicial realizada por Paola Cantarini por Guershom David (Mestrando no Programa de Pós-graduação em Direito Político e Econômico, onde realiza pesquisa aplicada voltada ao gerenciamento e criação de Inteligência Artificial Sistêmica, aplicável no ensino superior com auxílio à pessoa com deficiência através do projeto MackriAtIvity, atualmente na Incubadora da Universidade Presbiteriana Mackenzie. Faz parte ainda do programa de pós-graduação lato sensu em Direito Processual Civil, e Direito e Processo Penal, todos pela Universidade Presbiteriana Mackenzie, atuando ainda como coordenador adjunto da Clínica de Orientação Supervisionada a Migrantes e Refugiados MackPeace).
Versão original
Paola Cantarini: What’s your area of expertise? Would you start telling us about your work related to AI and data protection?
Jacob Appel: Thank you for having me. It’s an honor to be here. I’m from the field of A.I. auditing, as you already know, so I’m happy to have the opportunity to talk a little more about that. I work with a company called ORCA, which stands for O’Neill Risk Consulting and Algorithmic Auditing. Cathy O’Neill is the founder, CEO, my business partner, and writer of the book “Weapons of Math Destruction”, published in 2016. The book explains some of the ways algorithms, at that time called Big Data Systems, could possibly result in undesirable wrong processes. How it might have unintended consequences, or even how it might not work in the way its inventors initially envisioned. It is an influential book, which, more than seven years after its publication, presents how some of the methods she described have really come true nowadays. Some algorithms have started to misbehave in ways she envisioned. Not long after writing the book, O’Neill created the company I joined to help put into practice some of her written ideas. Therefore, as an algorithmic auditing company, when we started, the field was not really defined. We still have so many unanswered questions, such as ‘What is an algorithmic audit?’ which is somewhat an open question, but as we know, there has been a lot of progress in the last few years. In that sense, we talk about three areas of projects we work on as an algorithmic auditing company. One is when we assist clients who invite us to audit algorithms or an A.I. system they are using. It usually takes place when the company is worried about having a specific legal risk, or they have a reputational risk related to automated systems, therefore needing to hire us to come in to inspect by auditing and possibly figuring out what are the potential problems to mitigate it. We call it ‘a voluntary audit’. Another kind of project we work on is related to law enforcement agencies and attorneys general in the United States. It is inside our public sector, representing public interests by an attorney, that can result in investigations, lawsuits, and complaints against companies, or, for example, those companies that are suspected of breaking the law. In this case, we help these general attorneys in the investigation regarding suspected algorithmic scenarios or, simply put, wrongdoing. Imagine, for example, there’s a pending company that’s making student loans using an A.I. system to help with the underwriting. Therefore, if the attorney general believes that they’re violating some fair lending regulations, they might hire us to help with that investigation, specifically regarding the algorithm analysis. How can we figure out in the data whether the A.I. system was violating fair lending laws? That’s another kind of project we deal with, helping the State enforce laws that are not yet clearly applied to A.I. systems. We’re figuring out how to translate those laws into A.I. systems. In the third category of projects, we have to work alongside regulators, helping design frameworks and testing approaches that can apply to a whole sector. In that regard, over the past year, we’ve worked with two insurance commissioners in the United States. In the U.S., Insurance is regulated on a state-by-state basis. Each State has an insurance commissioner who gets to make the rules to manage the field in their State. Because of that, there are fifty different insurance commissioners and, as a result, fifty different sets of rules. Furthermore, once a state approves the state law prohibiting discrimination using A.I. systems in insurance, we assist Commissioners with the implementation of that law. Moreover, they have a mandate from the state law affirming that the issue of unfair discrimination by A.I. is considered not legal. Consequently, they usually invite us to help with the rule making of the law. So, that’s the third category we engage in contributing to implementing standards.
Paola Cantarini: Is there a need and, therefore, the possibility for a worldwide law to regulate AI globally, at least fixing minimal standards?
Jacob Appel: Since the question has two parts, I’m going to address those separately. I remember taking a philosophy class once, where they said ‘ought’ to imply ‘can.’ That means it must be possible if we should. I don’t consider it to be true in this case. It’s desirable to have coordination. But I’m skeptical that it will happen, which is the reason I’m going to talk about each part separately. Is there a need for international coordination or a truly universal set of standards? Maybe. We are not able to know at this point. Take, for example, our work in the U.S., where we often encounter questions of discrimination and fairness. These specific areas of focus appear when we emulate through our analyses to distinguish different ways A.I. systems can fail. We are often concerned about the possibility of failures related to equity in scenarios that might privilege one gender over another. As a matter of fact, those are questions that often come up to us. We’ve learned over the past years of working that those questions are local in nature. The protected classes in the U.S., especially regarding race and ethnicity, are slightly different from what they should be in Brazil. We have a different sense of race groups, maybe different norms and standards on our own laws about anti-discrimination, that may or may not be the same in another place. On a very practical level, I see that issue specially making it difficult to imagine one universal worldwide standard. As an example, representativeness in training data should consider training data reflecting all races and genders. There could be a ruling principle we all agree on later; however, if we had to get what those standards should be to ensure adequate representation, each standard would look different for every country, just like in China. The standards for ethnic representation are different than in the U.S. since the demographics are not the same. That’s the bad news because we’ve realized there are a lot of practical issues we need to truly standardize. Also, technology is developing, and this is specifically for Brazil. The extent to which the language model family of A.I. is tied to the English language over other languages is significant. Is this something Brazil has proposed to investigate? I am not sure which level of understanding is applicable to other languages regarding A.I. when we consider training data. Take, for example, ChatGPT, which is the all-the-rage here trained on data gathered from the Internet. The data is mostly in the English text language data, so it’s based on the American English language. They try to build out support for other languages; however, as we know, other languages operate in different ways since they have less training algorithm data. So, the Universal International Standards already perform in a very different way for each language. The reality as a practical perspective makes it hard even to imagine a single universal standard. However, is there a need for it? There are absolutely benefits to having international cooperation and coordination around the ways to regulate this technology or to regulate the waves to even engage with it both socially and from a regulation standpoint. We often see that in the U.S., as mentioned regarding our work alongside the insurance commissioners, by auditing insurance state by state. Whereas other issues are managed at the Federal level, there’s the opportunity for different states to do different things. This brings a bit of experimentation once they try to regulate A.I. in this way, and another state also tries to apply it in a different way. Similarly, it occurs across countries, as the U.K. and the E.U. are going to regulate A.I. in one particular way, while in the U.S., it can be a little different than China is going to do. So, if there’s a community of practice around the issue globally, then we can learn from each other’s experiments. But that’s different from having a single, universal standard that applies everywhere.
Paola Cantarini: How would the so-called “trade-off” between innovation and regulation work? Or would regulation by itself prevent or compromise innovation and international competition? According to Daniel SOLOVE, in his book “Nothing to Hide. The False Tradeoff Between Privacy and Security” (Y.U. Press, 2011), this would be a mistaken concept. Could you comment on this point?
Jacob Appel: Interesting. It sounds like a pretty anti-regulation stance. Allow me to just pick apart a little bit of what you said in that argument. This means that since we already have constitutional principles and data protection laws, it is superfluous or unnecessary to add some A.I. regulation. I’m not familiar with the Brazilian constitution, regulations, or data protection law. But, if I can make an analogy with the U.S. again, we also have not just constitutional laws but even specific laws in distinct sectors like hiring, credit, or housing to regulate, for example, against discrimination according to the basis of race, age, religion, or gender in those areas. Those laws exist, even having a legal precedent for following them to standards for compliance. However, when these latest A.I. or machine learning systems emerged, those laws theoretically were applied to these new systems, but it’s not been worked out how exactly and what it means in a practical sense. For example, if we are dealing with housing, it is not allowed to discriminate with housing offers based on race. Well, if we’re talking about an A.I. system that doesn’t use race specifically, as it happens throughout using many proxies for race, is it compliant with the law? You see, some of these are just open legal questions. So, just picking apart the argument you conveyed, I don’t think it’s sufficient to have constitutional laws that talk about fairness. Another example consists of verifying if those laws are translated into terms that can really apply to A.I. systems. Perhaps it doesn’t need to be new laws per se. Instead, we can simply have regulations or rule making that implement existing laws. It might only take a constitutional principle that we already have against discrimination to understand how that’s going to apply to A.I. systems used in these specific areas. Again, to answer your question, is it a tradeoff between Innovation and regulation? I think, yes, it is. It appears a little simplistic, meaning there are always constraints that bind Innovation. It’s more a question of how those constraints are really brought into the design process of these new A.I. systems currently and without a concerted effort to translate the existing rules into terms that can apply to A.I. systems. What we have is data scientists creating new technologies and tools and rolling them out just after, and maybe somebody else checks it from a compliance angle. But they’re not quite sure. By inspecting those systems, a lot of information is lost in translation. Still, we lack that kind of connection, which could come in the form of laws, regulations, or rule making to address specifically what this is and the A.I. system in this context. To analyze the context of housing or hiring, what is an A.I. system in this context to be compliant, fair, or non-discriminatory? We need to be specific about what we need on those issues.
Paola Cantarini: What is to you meant by AI governance?
Jacob Appel: We often use the idea or a metaphor of the cockpit in an airplane when we talk about this topic. So, I am thinking of an AI system as an airplane. When we look at an airplane, it’s expected to have a cockpit that has controls. Also, it has a bunch of gauges and dials to provide the minimum information to maneuver it. Providing the pilot with up-to-the-minute data so the pilot can fly the plane safely. The cockpit of an airplane is very different from the cockpit of a train, which differs from the cockpit of a car. Different kinds of vehicles use different cockpits by having other kinds of gauges and dials related to the different risks for each model of vehicle. It’s desirable to understand different types of information so we can address the potential risks and benefits, preferably in real-time. Applying that analogy, when we think about governance in the AI system, we need to design the cockpit safely. It needs to be safe to fly and to drive, just as it goes with the vehicle. We often describe our work, just like helping to design that cockpit, based on the need to figure out what the optimal dials or gauges we will need. Then, for each dial or gauge, what are the thresholds? What’s the red line? What are the maximum and minimum scenarios? When someone goes above the maximum, will that going to be a problem for someone driving or flying below the minimum? That’s how we think about governance by creating that kind of cockpit. To develop that carefully, we need to do a lot of work. It’s vital to come up with the optimal dials in the cockpit. We must really understand how the system could fail and then figure out what metrics to use in the monitor that must predict possible failures so people can catch them early. Our work involves a lot of qualitative analysis while extensively communicating with stakeholders. Therefore, we look at a certain algorithmic system in the context it’s going to be deployed to identify the stakeholder groups that might be involved in that scenario. It generally includes people who are building and deploying the algorithm and the people who are scared by or assessed by it. We provide a broader view of who the stakeholders are, researching each stakeholder group and asking how specific technologies would fail in their business or what their concerns are related to in the context. Just then, we can translate their concerns into the dials and gauges. For someone who works in the cockpit, it’s a lot of work because it needs to be applied to those specific contexts. Now, back to your earlier question: Could there be a universal rule regulating AI around the world? We think about this analogy of cockpits. There needs to be one set of dials, gages, or cockpit appropriate for every different kind of vehicle. We could all agree that it’s important to have a cockpit first only after agreeing on general principles of cockpit design. I can imagine us having concessions at that level, which would still be a tailored exercise to create a universal cockpit for any system deployed in a particular place or specific stakeholders.
Paola Cantarini: In this year’s Venice Architecture Biennale (2023) the theme of the Brazilian stand is “Earth and ancestry”, that is to say, decolonization (“De-colonizing the canon”, Brazil’s “Earth” pavilion at the Venice Biennale). Would it be possible to escape such colonialist logic, which is also present in the AI/data areas?
Jacob Apppel: Thank you for saying more about it and clarifying the context brought by the question. Now I have a better idea. Like by colonialist logic, do we mean the offshoring of content moderation? Are we saying this goes through workers and the fact they are using a lot of overseas and less expensive human labor to input these systems? is that what the question implies? It is that kind of colonialist logic we’re talking about. I wish I could have a hopeful answer to this one. What I mean by that is that, yes, it’s possible to do better than what we’re doing. Let me talk about two issues that you’ve mentioned. One is the consent to share our raw data when people are using these systems, and the other is the outsourcing of the human job of counter moderation or human feedback for this logical model system. Under outsourcing human labor, of course, we can do better by having skilled counter-moderations like working from Brazil, India, or some African countries. It could happen due to multiple factors, such as their necessity of appearing publicly at a disadvantage as engaged counter-moderators towards the quality of what they intended to do. Still, we could insist that counter moderation as we needed it to be, is important to pay the people their wage adequately. We could insist that they could gather toward their goal to have a union of their work based in the USA, but I don’t know. I am not a labor lawyer to better analyze the rules and decide what we could improve. However, counter-moderation at a high-level means that companies must set up social media platforms to post millions and millions of pieces of content every day. The setup would allow better decision scenarios, so they can decide their actions to check whether their content is safer or better. However, if that system works better, the advertising campaign should also do something different to their commercial incentives, therefore, the way one must hire people changes. They can set up counter-models to be as cheap as possible while satisfying the minimum requirements by law. If we, as a society, decide that we want better about this counter-moderation, we could insist on them, supposedly, through regulations to help companies do better jobs. So, I think that is possible, but it’s hard to see a political pathway to doing that. I guess the place where I see the biggest opportunity is where failures of moderation are really hurting people. Take the US as an example; in the past, we’ve had some actions around mental health, and especially eating disorders with social media among teenagers or young adults, causing mental health impacts that are being studied and talked about a little bit more. There was the Facebook Files leak a couple of years ago, where a whistleblower at Facebook showed some internal documents of them studying the mental health impacts of Instagram on vulnerable groups. Their study showed or found a sort of negative mental health impact. But then they varied their reports, since the information on the report came out, there’s a lot of content on social media platforms. We can nowadays find problems among young people in their eating habits that causes disorders. So, there is the anorexia content, and there are tons of diet pills that are being sold through Instagram accounts and just all kinds of bad stuff. It would be possible to dedicate more resources towards moderating those issues, by focusing on it more closely and making sure about risks of diet pills. That doesn’t go through this way since we couldn’t insist on it, and right now, there are a number of efforts in the US, like who makes laws, to make rules that will address this. So, it’s going through the legislative process, and it’s pretty hard. I mean, this is an issue that most people in the US agree on. I think, like, oh, we don’t want our kids to develop eating disorders because they’re possibly looking at skinny people on Instagram. It’s something all political parties can agree on, and it seemed like an easy issue, but. It hasn’t been easy to make laws around it. So yeah, I’m a little bit like, I think it’s possible, but I’m a little skeptical about making progress soon. And then, on the other thing you mentioned, like consent for personal data, that’s a hard one on a personal level. Before I was even an algorithmic auditor, I was worrying about this. Ever since getting Gmail and Google Maps for free, especially with Google Maps. I’m not a technical expert on this stuff but I think about all the work that went into making Google Maps and is still going on every day. For example, the application Street View is something getting better all the time, and we’re not paying anything for it, and we all got it. Obviously, that can’t be true. We are paying for it. We’re paying for it with our data. That’s it. That’s good. That’s been the setup, you know, for ten years, almost 20 years now. We’re way behind in terms of reclaiming our personal data or reclaiming the right to consent. I’m not saying it’s impossible to back out of the situation we’re in, but it’s going to be pretty hard. We’ve seen some startups in the US that are trying to take on this issue right where it’s like, oh, maybe. Instead of just signing up for these services. Maybe they’re going to have to buy my personal data from me, or maybe they’re going to have to offer me a paid option where they don’t collect my personal data. And I guess that’s happening in Europe now, right? I think I’ve seen this, maybe, as result of data protection rules in Europe. They decided to offer a new version of the system more into privacy-preserving, but the user must pay money. It is so great. I think that kind of thing would be great. It’s, but they won’t. The companies won’t offer it unless they’re forced to offer it. The current deal is perfect for them. The data is what is really valuable, and you know as well as they have. They have machines for roughly half of the data. So, we are going to have to force them to do something different if we want them to.
Paola Cantarini: What are the main challenges today with the advancement of AI, and after the controversy with ChatGPT and the “moratorium” requested in a letter/manifesto by Elon Musk and other leading figures?
Jacob Appel: This is the new frontier. There’s one of the things I’ve mentioned earlier, and it’s important to repeat. It consists in the extent of applicability of large language models, such as ChatGPT, which is primarily based on the English language, with its limited support for other languages. It becomes an issue to the sector thinking about it from a global perspective. Now, talking about the question, and this is a separate issue when mentioning a moratorium perspective, I see that’s emerging as a bigger story around here, and maybe it is only in the US since we’re not sure how much it’s coming around globally. Somehow, it represents a split within the AI community between people who are focused on existential risks. There is the X risk for existential risk, community, long-term, and the effective altruism community. In fact, this community is the same one that created that letter that everyone, including Elon Musk, signed, intending to postulate that maybe there should be a moratorium on developing these kinds of systems. Because allegedly, it could lead to catastrophic outcomes with AI taking over the control, leading to the extinction of humanity. So that’s one group. The other group, which is maybe more concerned with the impacts in a short term of time, not apocalyptical scenarios, but with AI just denying parole or bail to certain groups of people by making it harder for certain groups of people to get a job, get into college, or to get a credit card, insurance, etcetera. We at ORCA are more linked with the second group. We are looking towards understanding the risks or impacts now. What is happening now is not about an extinction-level event or an apocalypse. It’s about how these systems are already going to be used and also understanding what impacts they are already most likely bringing. My view is that within this field, there’s too much focus on the existential risks but not enough focus on the practical everyday harms to real people already happening. I don’t mean to say we should’ve entirely started to ignore existential risks because certainly that should be thought about. It is just that sometimes, it serves to continue to cut people out of the conversation through existential risks conversation. We all see who’s writing that letter and signing onto it. They are the same people, companies, tech titans. The same group who are worried about existentialism is a small, very small group where they’re the only ones to know how to fix the problem they have created. It’s all very inside, about to ignore, and doesn’t invite in. The actual stakeholders who are likely to be harmed by this stuff. It’s a way of keeping it all within that closed Silicon Valley community. That’s my personal take on it. I don’t know if it’s intentionally like that, but that’s just how it’s coming around. This world of effective altruism has been around for a while, and there were certainly around before these few AI companies got valuable. It is interesting to see what happened when their own work became one of the issues they might address from a philanthropic standpoint. I agree there are many opportunities for them to enrich themselves and better position themselves and their companies by being philanthropically necessary or important towards our safety. The last thing I mentioned that makes these large language models of challenge is the very fact that they are so multi-functional. They seem to be capable, at least reasonably, of so many different things. Things such as summarizing large amounts of text, chatting, and coming up with new ideas or recipes upon a prompt. They have so many different potential uses, as mentioned earlier. When we had ORCA with an AI system or algorithm, we insisted on auditing in the context. We don’t just think of it as the lines of codes; instead, we think about how it is as a system being used in the field to make this kind of decision concerning these stakeholders. Really consider how it’s being used. For large language models, I don’t think it’s possible to audit ChatGPT, since it can be used to do so many different things. Starting from writing to making an employee review, summarizing a bunch of Reddit comments, and helping a doctor make a medical diagnosis right away. These are all different, such different uses with so many different angles. In my opinion, the actual stake in the real world is really having to think about auditing each different use isolated. But currently, many of the people who are talking about AI safety or AI auditing are the same people who work at Open AI or at Microsoft. They’re the people making the foundational model. Any audit that would happen at the Foundation model level, any needed audit, not just ChatGPT, without a specific use case or context in mind, is going to be insufficient. It might show you something, but it’s not going to address all the risks of any one deployment. I think that’s a real challenge with these very multi-use tools, and I’m afraid that we’re not looking at the right altitude to manage them and audit them. The right attitude is at the level of the use case. That’s being used, but too much of the discussion is at the level of the general tool, which itself is another challenge. We are precisely pushing for that. Even now, in the last couple of months or so, there’s been a lot of action in the US. An executive order came out, when about a month ago, many agencies were directed to develop some guidance around AI systems, and we’re trying to push that effort toward specific use cases. Can we focus on some particular use cases, especially when it comes to these large language models, instead of regulating them in general or in the abstract? Let’s get to this specific use case. Moreover, as you said, once we’re in a particular case of use, then we can start to see emergent bias because it’s going to be contextual. It’s going to emerge relative to the context of that use.