Jump to content

Leaderboard

Popular Content

Showing content with the highest reputation since 09/11/2024 in Blog Entries

  1. I read the introduction from OpenAI I remember learning about browser design and I told my friends a browser that can go into a web page and extract will be very useful. I still think that is true but I also comprehend the level of security problems this leads to. OpenAI clearly comprehends a lawsuit can hit them so they have started using this through ChatGPT, and show one financial issue as it is only for people who pay for ChatGPT pro. But they plan to expand to Plus, Team, and Enterprise users. So the goal is for this to be a paid service. Now what functionalities are stated: scrolling on a webpage/interacting[typing or clicking] on a webpage [to fill out forms/ordering groceries {which means a users financial data}/creating memes{are they original ?}] When I think of OpenAI firm or the computer programs they created, and the larger community of firms or computer programs that modify themselves based on human input to mimick human interaction, what some call AI [which it is not], I am reaffirmed of the value private data has. The internet and its public data model, is how OpenAI and others were, in my legal view, able to illegally access enough data to get their computer programs to modify themselves strong enough to be convincing mimicks while not paying the financial price of accessing that data. Now, with Operator and others, the free internet information will be sifted through. So, I argue the future of the internet will be walls. Walls is the only answer to offer security in the future. This doesn't mean the world wide web will end. It means that it will break up into webs within the larger internet. Data crossing the webs will become expensive, big firms can pay. In Europe a number of cities already have city based internets, where aside of the world wide web, the city residents have their own city wide web which is only accessed by locals and doesn't allow for intergovernmental or interregional(regions under the same government) access. I wonder if someone has built a computer program on the same principles as the large language model to deter access by other computer programs based on the large language model. The best answer will be the advance in basic memory storage from the quantum computers or other technologies but that technology is more expensive in all earnest. OpenAI suggest they want it to be safe but it has an auto dysfunction they can't control. Humans. I can tell from their literature below, they see this as a corporate tool in the near future very much so, yes paid customers, but they are wary of the truly public internet, because anything connected can be manipulated and having a system that users will use as a heavier crutch while traveling throughout the internet picking up various little programs here or there will make every user of this more damaging to the security of the system. Now OpenAI will do its best to be as secure as possible, but the reality is, no one can defeat the dysfunction of the internet itself, which is its public connectivity. So , walls will be needed, a counter internet movement where some will have private data stores with units that can access only through wire and they will be specifically engineered. Maybe even quantum computing wire connections. Local webs supported by private information stores while isolated intentionally from the world wide web. I end with one point, human design inefficiency is at the heart of all of this. The internet itself, was allowed to grow corrupt or dysfunctionally by humans who: saw it as an invasive tool [governments/firms] , saw it incorrectly as a tool for human unity [idealism of many colleges or psychologist or social scientist]or as a tool to mirror science fiction uses of computers absent the lessons from the stories[star trek primarily, whose shows computers show the guidelines that computers + computer programs should have today that... welll ] IN AMENDMENT Black people from the americas[south/central/north/caribbean] , africa, asia, love using chatgpt. I do blame frederick Douglass for starting with the camera, suggesting an infatuation with tech that is embedded in the black populace in humanity. I don't use any of it. I only use deviantart dreamup and that is only because i pay for it, and sparsely. But relying on a tool isn't bad but one can over rely. And for the arts, the walls going up will be positive. At first negative, because audiences will be shocked. The first three phases of the internet [Basic era/world wide web era/ Large Language Model era] trained humans to see art in three ways: free +comforting+ idolic. Free meaning, people love art that is free. Free to make, free to acquire, free to access, free to whatever. Paying for art has become uncommon for the masses. This is why from porn to music videos to any literature, the money is little. The money in the arts is in live performance. Adult stars who perform live on livestream or do live events at conventions, musicians live concerts , literature being composed live. Is the profit angle. Comforting in that, art that doesn't provide what is expected is rejected more grandly. Give it a chance is not dead but has so few who do it and with no one in need of doing it, as the internet allows your artistic tastes to be eternally supported, you don't ever need to consider a different angle in any art. Lastly, Idolic, if an artist is popular these are the best of times. All the computer programs in media are designed to flow to the most popular, you see this in sprot stars/musicians/writers. The problem is the artist who isn't popular hahaha, has to find a way to become popular and it is more than commerciality. I know too many artists who have tried to be commercial and failed to suggest all an artist need do today is follow trends:) no... ARTICLES Introducing Operator A research preview of an agent that can use its own browser to perform tasks for you. Available to Pro users in the U.S. https://operator.chatgpt.com/July 17, 2025 update: Operator is now fully integrated into ChatGPT as ChatGPT agent. To access these updated capabilities, simply select “agent mode” from the dropdown in the composer and enter your query directly within ChatGPT. As a result, the standalone Operator site (operator.chatgpt.com) will sunset on in the coming weeks. Today we’re releasing Operator⁠(https://operator.chatgpt.com/ ), an agent that can go to the web to perform tasks for you. Using its own browser, it can look at a webpage and interact with it by typing, clicking, and scrolling. It is currently a research preview, meaning it has limitations and will evolve based on user feedback. Operator is one of our first agents, which are AIs capable of doing work for you independently—you give it a task and it will execute it. Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes. The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses. To ensure a safe and iterative rollout, we are starting small. Starting today, Operator is available to Pro users in the U.S. at operator.chatgpt.com⁠(opens in a new window). This research preview allows us to learn from our users and the broader ecosystem, refining and improving as we go. Our plan is to expand to Plus, Team, and Enterprise users and integrate these capabilities into ChatGPT in the future. How Operator works Operator is powered by a new model called Computer-Using Agent (CUA) [ https://openai.com/index/computer-using-agent/ ] . Combining GPT‑4o's vision capabilities with advanced reasoning through reinforcement learning, CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen. Operator can “see” (through screenshots) and “interact” (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations. If it encounters challenges or makes mistakes, Operator can leverage its reasoning capabilities to self-correct. When it gets stuck and needs assistance, it simply hands control back to the user, ensuring a smooth and collaborative experience. While CUA is still in early stages and has limitations, it sets new state-of-the-art benchmark results in WebArena and WebVoyager, two key browser use benchmarks. Read more about evals and the research behind Operator in our research blog post. How to use To get started, simply describe the task you’d like done and Operator can handle the rest. Users can choose to take over control of the remote browser at any point, and Operator is trained to proactively ask the user to take over for tasks that require login, payment details, or when solving CAPTCHAs. Users can personalize their workflows in Operator by adding custom instructions, either for all sites or for specific ones, such as setting preferences for airlines on Booking.com. Operator lets users save prompts for quick access on the homepage, ideal for repeated tasks like restocking groceries on Instacart. Similar to using multiple tabs on a browser, users can have Operator run multiple tasks simultaneously by creating new conversations, like ordering a personalized enamel mug on Etsy while booking a campsite on Hipcamp. Ecosystem & users Operator⁠(https://www.stocktonca.gov/ ) transforms AI from a passive tool to an active participant in the digital ecosystem. It will streamline tasks for users and bring the benefits of agents to companies that want innovative customer experiences and desire higher rates of conversion. We’re collaborating with companies like DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, Uber, and others to ensure Operator addresses real-world needs while respecting established norms. In addition to these collaborations, we see a lot of potential to improve the accessibility and efficiency of certain workflows, particularly in public sector applications. To explore these use cases further, we’re working with organizations like the City of Stockton⁠(opens in a new window) to make it easier to enroll in city services and programs. By releasing Operator to a limited audience initially, we aim to learn quickly and refine its capabilities based on real-world feedback, ensuring we balance innovation with trust and safety. This collaborative approach helps ensure Operator delivers meaningful value to users, creators, businesses, and public sector organizations alike. Safety and privacy Ensuring Operator is safe to use is a top priority, with three layers of safeguards to prevent abuse and ensure users are firmly in control. First, Operator is trained to ensure that the person using it is always in control and asks for input at critical points. Takeover mode: Operator asks the user to take over when inputting sensitive information into the browser, such as login credentials or payment information. When in takeover mode, Operator does not collect or screenshot information entered by the user. User confirmations: Before finalizing any significant action, such as submitting an order or sending an email, Operator should ask for approval. Task limitations: Operator is trained to decline certain sensitive tasks, such as banking transactions or those requiring high-stakes decisions, like making a decision on a job application. Watch mode: On particularly sensitive sites, such as email or financial services, Operator requires close supervision of its actions, allowing users to directly catch any potential mistakes. Next, we’ve made it easy to manage data privacy in Operator. Training opt out: Turning off ‘Improve the model for everyone’ in ChatGPT settings means data in Operator will also not be used to train our models. Transparent data management: Users can delete all browsing data and log out of all sites with one click under the Privacy section of Operator settings. Past conversations in Operator can also be deleted with one click. Lastly, we’ve built defenses against adversarial websites that may try to mislead Operator through hidden prompts, malicious code, or phishing attempts: Cautious navigation: Operator is designed to detect and ignore prompt injections. Monitoring: A dedicated “monitor model” watches for suspicious behavior and can pause the task if something seems off. Detection pipeline: Automated and human review processes continuously identify new threats and quickly update safeguards. We know bad actors may try to misuse this technology. That’s why we’ve designed Operator to refuse harmful requests and block disallowed content. Our moderation systems can issue warnings or even revoke access for repeated violations, and we’ve integrated additional review processes to detect and address misuse. We’re also providing guidance( https://openai.com/policies/using-chatgpt-agent-in-line-with-our-policies/ ) on how to interact with Operator in compliance with our Usage Policies.( https://openai.com/policies/usage-policies/ ) While Operator is designed with these safeguards, no system is flawless and this is still a research preview; we are committed to continuous improvement through real-world feedback and rigorous testing. For more on our approach, visit the safety section of the Operator research blog. Limitations Operator is currently in an early research preview, and while it’s already capable of handling a wide range of tasks, it’s still learning, evolving and may make mistakes. For instance, it currently encounters challenges with complex interfaces like creating slideshows or managing calendars. Early user feedback will play a vital role in enhancing its accuracy, reliability, and safety, helping us make Operator better for everyone. What's next CUA in the API: We plan to expose the model powering Operator, CUA, in the API soon so that developers can use it to build their own computer-using agents. Enhanced Capabilities: We’ll continue to improve Operator’s ability to handle longer and more complex workflows. Wider Access: We plan to expand Operator⁠(opens in a new window) to Plus, Team, and Enterprise users and integrate its capabilities directly into ChatGPT in the future once we are confident in its safety and usability at scale, unlocking seamless real-time and asynchronous task execution. Authors OpenAI Foundational research contributors Casey Chu, David Medina, Hyeonwoo Noh, Noah Jorgensen, Reiichiro Nakano, Sarah Yoo Core Andrew Howell, Aaron Schlesinger, Baishen Xu, Ben Newhouse, Bobby Stocker, Devashish Tyagi, Dibyo Majumdar, Eugenio Panero, Fereshte Khani, Geoffrey Iyer, Jiahui Yu, Nick Fiacco, Patrick Goethe, Sam Jau, Shunyu Yao, Stephan Casas, Yash Kumar, Yilong Qin XFN Contributors Abby Fanlo Susk, Aleah Houze, Alex Beutel, Alexander Prokofiev, Andrea Vallone, Andrea Chan, Christina Lim, Derek Chen, Duke Kim, Grace Zhao, Heather Whitney, Houda Nait El Barj, Jake Brill, Jeremy Fine, Joe Fireman, Kelly Stirman, Lauren Yang, Lindsay McCallum, Leo Liu, Mike Starr, Minnia Feng, Mostafa Rohaninejad, Oleg Boiko, Owen Campbell-Moore, Paul Ashbourne, Stephen Imm, Taylor Gordon, Tina Sriskandarajah, Winston Howes Leads Aaron Schlesinger (Infrastructure), Casey Chu (Safety and Model Readiness), David Medina (Research Infrastructure), Hyeonwoo Noh (Overall Research), Reiichiro Nakano (Overall Research), Yash Kumar Contributors Adam Brandon, Adam Koppel, Adele Li, Ahmed El-Kishky, Akila Welihinda, Alex Karpenko, Alex Nawar, Alex Tachard Passos, Amelia Liu, Andrei Gheorghe, Andrew Duberstein, Andrey Mishchenko, Angela Baek, Ankush Agarwal, Anting Shen, Antoni Baum, Ari Seff, Ashley Tyra, Behrooz Ghorbani, Bo Xu, Brandon McKinzie, Bryan Brandow, Carolina Paz, Cary Hudson, Chak Li, Chelsea Voss, Chen Shen, Chris Koch, Christian Gibson, Christina Kim, Christine McLeavey, Claudia Fischer, Cory Decareaux, Daniel Jacobowitz, Daniel Wolf, David Kjelkerud, David Li, Ehsan Asdar, Elaine Kim, Emilee Goo, Eric Antonow, Eric Hunter, Eric Wallace, Felipe Torres, Fotis Chantzis, Freddie Sulit, Giambattista Parascandolo, Hadi Salman, Haiming Bao, Haoyu Wang, Henry Aspegren, Hyung Won Chung, Ian O’Connell, Ian Sohl, Isabella Fulford, Jake McNeil, James Donovan, Jamie Kiros, Jason Ai, Jason Fedor, Jason Wei, Jay Dixit, Jeffrey Han, Jeffrey Sabin-Matsumoto, Jennifer Griffith-Delgado, Jeramy Han, Jeremiah Currier, Ji Lin, Jiajia Han, Jiaming Zhang, Jiayi Weng, Jieqi Yu, Joanne Jang, Joyce Ruffell, Kai Chen, Kai Xiao, Kevin Button, Kevin King, Kevin Liu, Kristian Georgiev, Kyle Miller, Lama Ahmad, Laurance Fauconnet, Leonard Bogdonoff, Long Ouyang, Louis Feuvrier, Madelaine Boyd, Mamie Rheingold, Matt Jones, Michael Sharman, Miles Wang, Mingxuan Wang, Nick Cooper, Niko Felix, Nikunj Handa, Noel Bundick, Pedro Aguilar, Peter Faiman, Peter Hoeschele, Pranav Deshpande, Raul Puri, Raz Gaon, Reid Gustin, Robin Brown, Rob Honsby, Saachi Jain, Sandhini Agarwal, Scott Ethersmith, Scott Lessans, Shauna O’Brien, Spencer Papay, Steve Coffey, Tal Stramer, Tao Wang, Teddy Lee, Tejal Patwardhan, Thomas Degry, Tomo Hiratsuka, Troy Peterson, Wenda Zhou, William Butler, Wyatt Thompson, Yao Zhou, Yaodong Yu, Yi Cheng, Yinghai Lu, Younghoon Kim, Yu-Ann Wang Madan, Yushi Wang, Zhiqing Sun Leadership Anna Makanju, Greg Brockman, Hannah Wong, Jerry Tworek, Liam Fedus, Mark Chen, Peter Welinder, Sam Altman, Wojciech Zaremba URL https://openai.com/index/introducing-operator/ Computer-Using Agent Powering Operator with Computer-Using Agent, a universal interface for AI to interact with the digital world. oday we introduced a research preview of Operator⁠(opens in a new window), an agent that can go to the web to perform tasks for you. Powering Operator is Computer-Using Agent (CUA), a model that combines GPT‑4o's vision capabilities with advanced reasoning through reinforcement learning. CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen—just as humans do. This gives it the flexibility to perform digital tasks without using OS-or web-specific APIs. CUA builds off of years of foundational research at the intersection of multimodal understanding and reasoning. By combining advanced GUI perception with structured problem-solving, it can break tasks into multi-step plans and adaptively self-correct when challenges arise. This capability marks the next step in AI development, allowing models to use the same tools humans rely on daily and opening the door to a vast range of new applications. While CUA is still early and has limitations, it sets new state-of-the-art benchmark results, achieving a 38.1% success rate on OSWorld for full computer use tasks, and 58.1% on WebArena and 87% on WebVoyager for web-based tasks. These results highlight CUA’s ability to navigate and operate across diverse environments using a single general action space. We’ve developed CUA with safety as a top priority to address the challenges posed by an agent having access to the digital world, as detailed in our Operator System Card. [ https://openai.com/index/operator-system-card/ ] In line with our iterative deployment strategy, we are releasing CUA through a research preview of Operator at operator.chatgpt.com⁠(opens in a new window) for Pro Tier users in the U.S. to start. By gathering real-world feedback, we can refine safety measures and continuously improve as we prepare for a future with increasing use of digital agents. How it works CUA processes raw pixel data to understand what’s happening on the screen and uses a virtual mouse and keyboard to complete actions. It can navigate multi-step tasks, handle errors, and adapt to unexpected changes. This enables CUA to act in a wide range of digital environments, performing tasks like filling out forms and navigating websites without needing specialized APIs. Given a user’s instruction, CUA operates through an iterative loop that integrates perception, reasoning, and action: Perception: Screenshots from the computer are added to the model’s context, providing a visual snapshot of the computer's current state. Reasoning: CUA reasons through the next steps using chain-of-thought, taking into consideration current and past screenshots and actions. This inner monologue improves task performance by enabling the model to evaluate its observations, track intermediate steps, and adapt dynamically. Action: It performs the actions—clicking, scrolling, or typing—until it decides that the task is completed or user input is needed. While it handles most steps automatically, CUA seeks user confirmation for sensitive actions, such as entering login details or responding to CAPTCHA forms. Evaluations CUA establishes a new state-of-the-art in both computer use and browser use benchmarks by using the same universal interface of screen, mouse, and keyboard. Evaluation details are described here ( https://cdn.openai.com/cua/CUA_eval_extra_information.pdf ) Browser use WebArena⁠( https://arxiv.org/abs/2307.13854 ) and WebVoyager⁠( https://arxiv.org/abs/2401.13919 ) are designed to evaluate the performance of web browsing agents in completing real-world tasks using browsers. WebArena utilizes self-hosted open-source websites offline to imitate real-world scenarios in e-commerce, online store content management (CMS), social forum platforms, and more. WebVoyager tests the model’s performance on online live websites like am*zon, GitHub, and Google Maps. In these benchmarks, CUA sets a new standard using the same universal interface that perceives the browser screen as pixels and takes action through mouse and keyboard. CUA achieved a 58.1% success rate on WebArena and an 87% success rate on WebVoyager for web-based tasks. While CUA achieves a high success rate on WebVoyager, where most tasks are relatively simple, CUA still needs more improvements to close the gap with human performance on more complex benchmarks like WebArena. Computer use OSWorld⁠( https://arxiv.org/abs/2404.07972 ) is a benchmark that evaluates models’ ability to control full operating systems like Ubuntu, Windows, and macOS. In this benchmark, CUA achieves 38.1% success rate. We observed test-time scaling, meaning CUA’s performance improves when more steps are allowed. The figure below compares CUA’s performance with previous state-of-the-arts with varying maximum allowed steps. Human performance on this benchmark is 72.4%, so there is still significant room for improvement. The following visualizations show examples of CUA navigating a variety of standardized OSWorld tasks. CUA in Operator We’re making CUA available through a research preview of Operator, an agent that can go to the web to perform tasks for you. Operator is available to Pro users in the U.S. at operator.chatgpt.com⁠(opens in a new window). This research preview is an opportunity to learn from our users and the broader ecosystem, refining and improving Operator iteratively. As with any early-stage technology, we don’t expect CUA to perform reliably in all scenarios just yet. However, it has already proven useful in a variety of cases, and we aim to extend that reliability across a wider range of tasks. By releasing CUA in Operator, we hope to gather valuable insights from our users, which will guide us in refining its capabilities and expanding its applications. In the table below, we present CUA’s performance in Operator on a handful of trials given a prompt to illustrate its known strengths and weaknesses. Category Prompt Success / attempts Note Interacting with various UI components to accomplish tasks Turn 1: Search Britannica for a detailed map view of bear habitats Turn 2: Great! Now please check out the black, brown and polar bear links and provide a concise general overview of their physical characteristics, specifically their differences. Oh and save the links for me so I can access them quickly. 10 / 10 View trajectory CUA can interact with various UI components to search, sort, and filter results to find the information that users want. Reliability varies for different websites and UIs. I want one of those target deals. Can you check if they have a deal on poppi prebiotic sodas? If they do, I want the watermelon flavor in the 12fl oz can. Get me the type of deal that comes with this and check if it's gluten free. 9 / 10 View trajectory I am planning to shift to Seattle and I want you to search Redfin for a townhouse with at least 3 bedrooms, 2 bathrooms, and an energy-efficient design (e.g., solar panels or LEED-certified). My budget is between $600,000 - $800,000 and it should ideally be close to 1500 sq ft. 3 / 10 View trajectory Tasks that can be accomplished through repeated simple UI interactions Create a new project in Todoist titled 'Weekend Grocery Shopping.' Add the following shopping list with products: Bananas (6 pieces) Avocados (2 ripe) Baby Spinach (1 bag) Whole Milk (1 gallon) Cheddar Cheese (8 oz block) Potato Chips (Salted, family size) Dark Chocolate (70% cocoa, 2 bars) 10 / 10 View trajectory CUA can reliably repeat simple UI interaction multiple times to automate simple, but tedious tasks from users. Search Spotify for the most popular songs of the USA for the 1990s, and create a playlist with at least 10 tracks. 10 / 10 View trajectory Tasks where CUA shows a high success rate only if prompts include detailed hints on how to use the website. Visit tagvenue.com and look for a concert hall that seats 150 people in London. I need it on Feb 22 2025 for the entire day from 9 am to 12 am, just make sure it is under £90 per hour. Oh could you check the filters section for appropriate filters and make sure there is parking and the entire thing is wheelchair accessible. 8 / 10 View trajectory Even for the same task, CUA’s reliability might change depending on how we are prompting the task. In this case, we can improve the reliability by providing specifics of date (e.g. 9 am to 12am vs entire day from 9 am), and by providing hints on which UI should be used to find results (e.g. check the filters section …) Visit tagvenue.com and look for a concert hall that seats 150 people in London. I need it on Feb 22 2025 for the entire day from 9 am, just make sure it is under £90 per hour. Oh and make sure there is parking and the entire thing is wheelchair accessible. 3 / 10 Struggling to use unfamiliar UI and text editing Use html5editor and input the folowing text on the left side, then edit it following my instructions and give me a screenshot of the entire thing when done. The text is: Hello world! This is my first text. I need to see how it would look like when programmed with HTML. Some parts should be red. Some bold. Some italic. Some underlined. Until my lesson is complete, and we shift to the other side. ... Hello world! should have header 2 applied The sentence below it should be a regular paragraph text. The sentence mentioning red should be normal text and red The sentence mentionnihg bold should be normal text bolded Sentence mentioning italic should be italicized The final sentence should be aligned to the right instead of the usual left 4 / 10 View trajectory When CUA has to interact with UIs that it hasn't interacted much with during training, it struggles to figure out how to use the provided UI appropriately. It often results in lots of trial and errors, and inefficient actions. CUA is not precise at text editing. It often makes lots of mistakes in the process or provides output with error. Safety Because CUA is one of our first agentic products with an ability to directly take actions in a browser, it brings new risks and challenges to address. As we prepared for deployment of Operator, we did extensive safety testing and implemented mitigations across three major classes of safety risks: misuse, model mistakes, and frontier risks. We believe it is important to take a layered approach to safety, so we implemented safeguards across the whole deployment context: the CUA model itself, the Operator system, and post-deployment processes. The aim is to have mitigations that stack, with each layer incrementally reducing the risk profile. The first category of risk is misuse. In addition to requiring users to comply with our Usage Policies, we have designed the following mitigations to reduce Operator’s risk of harm due to misuse, building off our safety work for GPT‑4o( https://openai.com/index/gpt-4o-system-card/ ) : Refusals: The CUA model is trained to refuse many harmful tasks and illegal or regulated activities. Blocklist: Operator cannot access websites that we’ve preemptively blocked, such as many gambling sites, adult entertainment, and drug or gun retailers. Moderation: User interactions are reviewed in real-time by automated safety checkers that are designed to ensure compliance with Usage Policies and have the ability to issue warnings or blocks for prohibited activities. Offline detection: We’ve also developed automated detection and human review pipelines to identify prohibited usage in priority policy areas, including child safety and deceptive activities, allowing us to enforce our Usage Policies. The second category of risk is model mistakes, where the CUA model accidentally takes an action that the user didn’t intend, which in turn causes harm to the user or others. Hypothetical mistakes can range in severity, from a typo in an email, to purchasing the wrong item, to permanently deleting an important document. To minimize potential harm, we’ve developed the following mitigations: User confirmations: The CUA model is trained to ask for user confirmation before finalizing tasks with external side effects, for example before submitting an order, sending an email, etc., so that the user can double-check the model’s work before it becomes permanent. Limitations on tasks: For now, the CUA model will decline to help with certain higher-risk tasks, like banking transactions and tasks that require sensitive decision-making. Watch mode: On particularly sensitive websites, such as email, Operator requires active user supervision, ensuring users can directly catch and address any potential mistakes the model might make. One particularly important category of model mistakes is adversarial attacks on websites that cause the CUA model to take unintended actions, through prompt injections, jailbreaks, and phishing attempts. In addition to the aforementioned mitigations against model mistakes, we developed several additional layers of defense to protect against these risks: Cautious navigation: The CUA model is designed to identify and ignore prompt injections on websites, recognizing all but one case from an early internal red-teaming session. Monitoring: In Operator, we’ve implemented an additional model to monitor and pause execution if it detects suspicious content on the screen. Detection pipeline: We’re applying both automated detection and human review pipelines to identify suspicious access patterns that can be flagged and rapidly added to the monitor (in a matter of hours). Finally, we evaluated the CUA model against frontier risks outlined in our Preparedness Framework⁠(https://cdn.openai.com/openai-preparedness-framework-beta.pdf ), including scenarios involving autonomous replication and biorisk tooling. These assessments showed no incremental risk on top of GPT‑4o. For those interested in exploring the evaluations and safeguards in more detail, we encourage you to review the Operator System Card, a living document that provides transparency into our safety approach and ongoing improvements. As many of Operator’s capabilities are new, so are the risks and mitigation approaches we’ve implemented. While we have aimed for state-of-the-art, diverse and complementary mitigations, we expect these risks and our approach to evolve as we learn more. We look forward to using the research preview period as an opportunity to gather user feedback, refine our safeguards, and enhance agentic safety. Conclusion CUA builds on years of research advancements in multimodality, reasoning and safety. We have made significant progress in deep reasoning through the o-model series, vision capabilities through GPT‑4o, and new techniques to improve robustness through reinforcement learning and instruction hierarchy( https://openai.com/index/the-instruction-hierarchy/ ). The next challenge space we plan to explore is expanding the action space of agents. The flexibility offered by a universal interface addresses this challenge, enabling an agent that can navigate any software tool designed for humans. By moving beyond specialized agent-friendly APIs, CUA can adapt to whatever computer environment is available—truly addressing the “long tail” of digital use cases that remain out of reach for most AI models. We're also working to make CUA available in the API⁠(https://platform.openai.com/ ), so developers can use it to build their own computer-using agents. As we continue to iterate on CUA, we look forward to seeing the different use cases the community will discover. We plan to use the real-world feedback we gather from this early preview to continuously refine CUA’s capabilities and safety mitigations to safely advance our mission of distributing the benefits of AI to everyone. Authors OpenAI References Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku⁠(https://www.anthropic.com/news/3-5-models-and-computer-use ) Model Card Addendum: Claude 3.5 Haiku and Upgraded Claude 3.5 Sonnet⁠( https://assets.anthropic.com/m/1cd9d098ac3e6467/original/Claude-3-Model-Card-October-Addendum.pdf ) Kura WebVoyager benchmark⁠(https://www.trykura.com/benchmarks ) Google project mariner⁠( https://deepmind.google/technologies/project-mariner/ ) OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments⁠(https://os-world.github.io/ ) WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models⁠(https://arxiv.org/abs/2401.13919 ) WebArena: A Realistic Web Environment for Building Autonomous Agents⁠( https://webarena.dev/ ) Citations Please cite OpenAI and use the following BibTeX for citation: http://cdn.openai.com/cua/cua2025.bib URL https://openai.com/index/computer-using-agent/ OpenAI’s new AI browser could rival Perplexity — here’s what I hope it gets right Story by Amanda Caswell OpenAI is building a brand-new web browser, and it could completely change how we search, browse and get things done online. According to recent leaks and an exclusive report from Reuters, the company behind ChatGPT is working on a Chromium-based browser that integrates AI agents directly into your browsing experience. Internally codenamed “Operator,” this new browser is expected to go far beyond search to offer smart, memory-equipped agents that can summarize pages, complete actions (like booking travel) and eventually handle full web-based tasks for you. If this sounds like Perplexity’s Comet, you’re right. The recently launched AI-powered browser integrates search and sidebar answers directly into the page. OpenAI’s browser will likely compete with Chrome and Comet, but hasn’t launched yet. It’s rumored to be rolling out first to ChatGPT Plus subscribers in the U.S. as part of an early beta, possibly later this summer. As someone who tests AI tools for a living, I’ve tried nearly every smart assistant and search engine on the market. And while Perplexity’s Comet offers a solid first look at the future of AI browsing, here’s what I’m most excited for from OpenAI’s take, and what I hope it gets right. 1. A truly proactive browsing assistant Perplexity is great at answering questions. But what I want from OpenAI’s browser is something more autonomous; an assistant that doesn't just wait for a prompt but actively enhances the page I'm on. Imagine browsing am*zon and having the assistant automatically suggest product comparisons or pull in real reviews from Reddit. Or reading a news article and instantly seeing a timeline, source context and differing viewpoints, but with zero prompting. That level of proactive help could turn passive browsing into intelligent discovery and I’m totally here for it. 2. Built-in agents that take action OpenAI’s “Operator” agents are rumored to handle full tasks beyond search or summarization. For instance, filling out forms, booking tickets or handling customer service chats will all be done for you. If that’s true, it’s a major leap forward. While Perplexity’s Comet is great for pulling in answers, OpenAI’s approach may introduce a new category of browser-based automation powered by memory, context and reasoning. 3. Cleaner answers, better sources Let’s be honest: search engines today are filled with AI-generated slop, vague product listicles, SEO junk and misleading clickbait. Perplexity tries to solve this by pulling answers from verified sources and citing them in real time. OpenAI could go even further, drawing from its own training data and web browsing capabilities to offer cleaner, more nuanced summaries with source-level transparency. If they can combine the conversational intelligence of ChatGPT with web accuracy, it could help reverse the search spam crisis. 4. One tab to rule them all If OpenAI’s browser integrates with ChatGPT’s existing multimodal tools, including everything from image generation to spreadsheet analysis and file uploads, it could become the first true all-in-one productivity browser. That would give creators, students and professionals a seamless way to write, code, search, design and automate within one interface. The bottom line Perplexity’s Comet browser is a strong first step toward smarter web browsing. But OpenAI’s rumored browser has the potential to go further by offering a more intelligent, personalized and action-ready browsing experience. I’ll be watching closely for the beta invite to drop. And if it delivers on the promise of proactive agents, real web automation and a cleaner, more useful internet, this could be the most exciting browser launch since Chrome. URL https://www.msn.com/en-us/news/technology/openai-s-new-ai-browser-could-rival-perplexity-here-s-what-i-hope-it-gets-right/ar-AA1Iou29?ocid=BingNewsSerp
    1 point
×
×
  • Create New...