• Poor Man's Computer Use with Execute Arbitrary AppleScript MCP Server

    Disclaimer: you should definitely not do this!

    I have been playing with Model Context Protocol and just realized you can get proof-of-concept-level computer use with very minimal code if you give the MCP client the ability to execute AppleScript and take screenshots. With just these rough tools you can coax some agentic behavior with a feedback loop.

    #!/usr/bin/env python3
    from mcp.server.fastmcp import FastMCP, Image
    import os
    import subprocess
    
    # Initialize the MCP server with a chosen name
    mcp = FastMCP("applescript_server", dependencies=["pyautogui", "Pillow"])
    
    @mcp.tool()
    def applescript_run(script: str) -> dict:
        """
        Executes arbitrary AppleScript via osascript.
    
        Args:
            script (str): The AppleScript code to execute.
    
        Returns:
            dict: A dictionary containing stdout, stderr, and the return code.
        """
        try:
            # Run the AppleScript command using osascript
            proc = subprocess.run(
                ['osascript', '-e', script],
                capture_output=True,
                text=True,
                check=False  # Allow non-zero exit codes to be returned in the response
            )
            return {
                "stdout": proc.stdout.strip(),
                "stderr": proc.stderr.strip(),
                "returncode": proc.returncode
            }
        except Exception as e:
            return {"error": str(e)}
    
    @mcp.tool()
    def take_screenshot() -> Image:
        """
        Take a screenshot using AppleScript to execute macOS' screencapture,
        forcing JPEG output. If the JPEG data exceeds 1MB, downscale the image
        to reduce its size.
        """
        import io, tempfile, os
        from PIL import Image as PILImage
    
        # Create a temporary file with a .jpg suffix.
        with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as tmp_file:
            tmp_filename = tmp_file.name
    
        # Use AppleScript to run the screencapture with JPEG output.
        script = f'do shell script "screencapture -t jpg -x \\"{tmp_filename}\\""'
        result = applescript_run(script=script)
    
        if result.get("returncode", 0) != 0:
            error_msg = result.get("stderr") or "Unknown error during screenshot capture"
            return {"error": f"Screenshot failed: {error_msg}"}
    
        try:
            # Open the captured image
            img = PILImage.open(tmp_filename)
    
            # Function to save image to JPEG buffer with compression.
            def save_to_buffer(image):
                buf = io.BytesIO()
                image.save(buf, format="JPEG", quality=60, optimize=True)
                return buf.getvalue()
    
            # Save image and check size.
            image_data = save_to_buffer(img)
            max_allowed = 1048576  # 1MB
    
            if len(image_data) > max_allowed:
                # Downscale the image to reduce size.
                new_size = (img.width // 2, img.height // 2)
                img = img.resize(new_size, PILImage.LANCZOS)
                image_data = save_to_buffer(img)
        except Exception as e:
            return {"error": f"Error processing screenshot file: {e}"}
        finally:
            try:
                os.remove(tmp_filename)
            except Exception:
                pass
    
        return Image(data=image_data, format="jpeg")
    
    if __name__ == '__main__':
        # Run the server using stdio transport so that it can be invoked by local MCP clients
        mcp.run(transport="stdio")
    

  • Please don’t disable paste

    It’s tax season. I unfortunately had to log in to the Philadelphia Revenue Department’s website. There are a lot of things that could be said about the city’s web UX, but I’ll save those thoughts for now. What I want to share is a plea.

    Please don’t disable paste. A pattern I saw and have seen on many bureaucratic sites is two fields like this:

    • Enter <some-info>
    • Re-enter <some info> to confirm

    With paste disabled on the second field. I guess that the developers don’t want users to paste a copied typo? But (A) I’m less likely to make a mistake if I’m copying and pasting data directly from another source, and (B) on the off chance the data is wrong, I’d rather pay the low probability price of a round trip for a backend error than the guaranteed price of tediously typing out some string. Or you could do client side mismatch detection.

    With mobile browsers this becomes even more important since I can even copy text directly from photos of physical docs thanks to built-in OCR, and typing on the phone keyboard is especially annoying. Allowing paste is a small thing that can improve user experience cheaply.


  • Blogging via Email

    I’d like to blog more, and specifically to write more short-form posts, but there’s sometimes an impedance mismatch between the maturity of a potential idea (low) and the effort required to create the corresponding post (high). It can feel like I have to get the F-35 out of the hangar for a trip to the corner.

    I was originally drawn to Jekyll for blogging (posts were in plain text, were portable, it was customizable, had plugins, easy to host, etc). But that setup meant publishing a post involved — beyond actually writing the article — committing it to git, SSHing to my VPS, syncing the changes, running a deployment command. Of course that could all be automated, but even needing my laptop handy was a small hurdle that required additional motivation. I still do like having my writing stored this way, and I have written workflow tooling previously.

    I recalled The Past, before certain social media sites had their attention extraction dials turned all the way up, and I was a regular user. The ability to make posts directly from my phone or any browser offered such a low barrier to entry that I could share ideas a lot more freely.

    Maybe it’s a good thing that there’s an effective filter saving the internet from more literal low-effort posts, hot takes, etc. But I want this blog to be a place where I can share ideas, even if some of those ideas might be trite or underdeveloped. Basically I am embracing quantity over quality, because if it’s easier to write more, then I will write more, which will improve my writing (and writing is thinking).

    So recently I wanted to make blogging easier while retaining Jekyll as the underlying blog generator. I like the idea of using email as an interface here, as it’s a cheap way to get rich text post editing and drafting. I wrote mail2blog, a small utility that reads an IMAP mailbox and creates posts from the emails. With a bit of extra scripting I can automatically publish these posts on a cron.

    Hopefully this encourages me to write more, but even if not, it was at least a fun weekend project.


  • Using an E-Ink Monitor: Part 2

    This is a follow up to my 2024 post about using the Dasung Paperlike HD-F e-ink monitor.


    DASUNG monitor with 3D printed stand hinges

    It’s spring again in Philadelphia, which means I’m dusting off my Dasung 13.3” Paperlike HD-F. This portable e-ink monitor allows me to work on my laptop outside, in full sunlight.

    Since last year I’ve made some changes to improve my experience with the monitor.

    Clearing the monitor screen programmatically

    The monitor suffers from ghosting, where an after-image of the screen contents persists faintly. This can be annoying and reduce legibility, especially as it builds up over time. There’s a physical button on the front of the monitor that resets/clears the screen. I was looking for a software solution to clear it so that I could keep my hands on the keyboard and not have to press the button, which nudges the monitor from its position resting on top of the laptop screen.

    In my previous post I reported that I couldn’t get Dasung’s PaperLikeClient software to work on my Macbook. That is still the case, but I discovered a way to clear the monitor using Lunar. With the Lunar CLI (which you can install via right clicking the GUI Lunar app menubar icon > Advanced features > Install CLI integration), you can clear the monitor using this command:

    lunar ddc PaperlikeHD 0x08 0x0603
    

    I put that into a Raycast script, so now clearing the screen is just a few keystrokes away.

    Addressing flickering

    A Reddit poster pointed out that Apple’s temporal dithering (FRC) causes some flickering on the Dasung monitor. I did notice this after they raised it, and I tried their suggested solution of using Stillcolor. Stillcolor does indeed turn off temporal dithering which resolved the flickering.

    Securing external monitor to laptop screen

    Last year, I had been using the Dasung monitor by basically resting it in front of the laptop’s built-in screen. This approach was less than ideal. First, the monitor would often slip and slide down over the keyboard, since there isn’t much lip to hold it up. Second, the monitor is rather heavy, and at certain angles it would make the laptop sreen fall open to its full extent. My temporary solution was to use a bag clip to hold the monitor in place. That only sort of solved the first problem, but it didn’t work that well.


    DASUNG monitor rested on top of laptop with bag clip
    My e-ink monitor setup circa 2024


    In the fall, I roped in my mechanical engineer friend to draft and 3D print some pieces to help secure the monitor in this arrangement. We worked together on developing some hinges to (a) hold the monitor in place and (b) support the laptop screen at a specific angle.


    hinge detail
    Detail of the 3D-printed hinge/holder


    After a couple iterations, he produced these small, adjustable hinges.

    • These feature a thumbnut to allow securing the laptop hinge at a specific angle, preventing the screen from falling fully open
    • They have a pronounced vertical support that the base of the e-ink monitor rests upon, holding it up
    • There’s also a slot to allow access to the ports


    detail of hinge in use
    Close-up of one of the hinges in use


    I’ve only had the opportunity to use the monitor with these hinges a couple times, but so far they’re solving the problem splendidly. I’m looking forward to many days of working from the roof 🕶️


    laptop with monitor rested on it and hinges
    The hinges in use, supporting the e-ink monitor



  • AI and the Uncertain Future of Work

    a toaster with wings at the airport

    Software can now do something that looks a lot like thinking. So, like many knowledge workers, I’ve been guessing about the implications of AI progress for my continued employability.

    Let me start by saying that AI has already enhanced my experience as a computer user. I use ChatGPT for brainstorming, research, summarization, translation and simplification, phrasing and word-finding, cooking, trip planning, book recommendations, software development, self-reflection, and generally as a replacement for Google. At work specifically I use AI to augment what I do. It helps me understand code, write small, personal utility programs from scratch, write and refactor parts of large codebases, aid in code review, summarize text, help with writing and documentation, etc.

    How fast and to what degree will AI replace aspects of my job as a technologist? What does a senior software engineer at a SaaS do all day? What would an AI system need to be capable of to displace its meat counterparts?

    I think that today, transformer-based deep learning foundation models like those underpinning Claude, ChatGPT, and Gemini nearly have the required raw reasoning capabilities to fulfill many of the software delivery responsibilities of a typical web developer. A simple version of that sw delivery pipeline looks like this:

    linear capabilities of a swe

    While AI tools can be prompted to do some subset of those tasks in isolation–iterate on product specs, design components, write pieces of code, maybe react to test output, etc.–I don’t know of any single system that can reliably do all of those things end to end and with minimal input. Yet.

    In terms of writing software: smaller, fast models like Github Copilot can, for years now, complete basic statements and pattern-match boilerplate; newer chatbots can reason about substantial amounts of code and write complex modules end to end; emerging products like Cursor, Windsurf, and Aider write and modify large, interconnected components. It’s not hard to imagine a black box code modifier where a description of a change goes in and a PR with code, passing tests, and an explanation of the change comes out.

    In real life, the job isn’t a clean assembly line. The various tasks form a densely interconnected graph; the dependencies between them are loops. There are also lots of other responsibilities not directly related to the goal of making software.

    dynamic capabilities of a swe

    It might appear that there would need to be a large model capability improvement for AI to be able to autonomously handle all the job functions of even an average knowledge worker. I’m not convinced this is the case. Humans can use current AI to great effect by repeatedly prompting the system, incorporating outside information, and evaluating responses with empirical feedback from the world. Even if model progress halted, how much advancement could be made by incorporating feedback loops and tool use?

    The AI capability gaps are shrinking month by month. Agentic AI systems–ones that operate with autonomy and goal-directed behavior–are in development. We see this with early prototypes of multi-modal, browser-use AI systems and protocols to interface with external systems. OpenAI has reportedly been planning specialized agents to the tune of $20k/month. It’s obviously a very hard problem with lots of ambiguity, but LLMs are rather good at reasoning around ambiguity. And the potential upside for the winners that emerge will be huge.

    2024 MAD ML/AI companies

    If an AI system gets scaffolding allowing it to integrate with arbitrary services, provision infrastructure, run scripts, read outputs, deploy code, ping colleagues, and maybe even use a credit card, might it be able to approximate the output of a human worker? Even if it can’t do all of those things or do them perfectly all the time, it could still decimate the workforce. Similarly to how self-driving car companies rely on remote human operators to provide guidance in exceptional situations, perhaps the knowledge worker of tomorrow will be dropped in to nudge an AI agent in the right direction.

    Much of the work of software engineering involves person to person communication. We talk to product managers to understand customer pain points, we discuss tradeoffs with stakeholders, we interview candidates, and share our ideas with others. In a world where human knowledge workers are being phased out by AI, though, such communication work becomes less common. So some of the job responsibilities might become precipitously irrelevant.

    All that said, I haven’t seen compelling products that can autonomously do entire jobs, except for maybe some support representative chat applications. I think it will be years before effective versions of such tools arrive, if they ever do. It’s conceivable that the tech will hit a plateau somewhere below the skill level required to take our jobs. Maybe it’s hopium or maybe I’m underestimating the rate of AI progress.

    If the pace holds, though, then at some point, entry level desk jobs will begin to be commoditized. I imagine it’s already impacting contractors that provide basic graphic design services, copy writing and editing, data entry, candidate screening, website building, and so on. This deskilling will impact the course of career development as well, since expert professionals must necessarily first be novices. How might green new grads get the requisite experience to grow into seasoned positions if the intro roles have mostly gone to machines?

    workforce training deficit

    Software is eating the world, but AI is eating software. The industry has so far witnessed a monotonically increasing demand for software–as abstractive layer after layer enabled more software to be created more easily, it seems not to have lessened the demand for applications or the workers that produce them. But that software over the years was not writing itself… The technological advancement of recent AI feels like a difference in kind, not just degree.

    A time may come when the art of computer programming is regarded as a historical eccentricity rather than as a useful skill. Mercifully, there will be a messy middle where untangling the mounds of vibecoder-generated spaghetti will require professional intervention; during this time skills like software engineering and debugging will be direly needed. Beyond that, as the artificial agents are given ever larger chunks of responsibility, who knows what exactly our role as human technologists will be.

    How should software engineers prepare for the coming changes? What I am doing is paying attention and learning about AI tools: what they are, how to use them, where they succeed and where they fail. Workers effectively incorporating such tools into their practice will outperform those who resist. I don’t suggest dismissing these new capabilities as a fad. AI tools are here to stay, they’re getting more powerful and useful, and they are going to affect how we work. I believe that in the medium term, creative professionals that embrace AI will see their output increase and their tedium decrease. As usual, the world of software is changing, and we’ll have to adapt or die.

    Is a life without white-collar workers really a life worth living?

    Perhaps one day, entire companies will be run by AI agents, a simulacra of human behavior. Vaguely guided by the idea of an autonomous business, I built a toy version in my free time a few months ago: an AI-run t-shirt seller. The AI would read trending t-shirt product tags, use those to generate a new idea for a t-shirt design, it would generate an image based on the design, and then a bit of browser automation would upload that image to an on-demand shirt printer marketplace. I didn’t get around to the part where the program would remix the top selling designs to build a fashion empire, because the platform shut my account down after a couple hours.


    a toy business run by AI

    The hardest part of the implementation was the finicky browser automation and working around captchas. Having a (reverse-engineered) API that allowed the AI-based program to upload t-shirt images gave it agency. I think we’ll see an increasing number of service providers offer API options where there had previously only been UIs, and we’ll also see UI to API translation layers enabled by AI. I imagine there might be centralized “business in a box” platforms that hook into services like Stripe, Intercom, Mailchimp, Shopify, and Docusign, giving AI agents access to a bevy of specialized tools without a human having to configure those one by one. Eventually, agentic systems will have no problem dealing with the remaining UIs directly.

    This is the dream of business owners: a machine where you put in a dime and out comes a dollar. I think it’s hard to look at the advancements in AI and not see the enticing prospect of a money printer. Business owners will increasingly seek to replace human labor with AI because the latter is much cheaper and never rests.

    the prospect of AI a money printer

    If AI can replace software engineers, are any cognitive laborers safe? Until and if we have ASI, at the margins there will be people who can’t be replaced: those generating new knowledge, doing the most complex research, etc. Roles that today involve human interaction may not actually go unscathed. Grantors and customers may prefer to hear from an AI than to be plied by a salesperson. And those external humans may themselves be replaced by bots.

    How do things look when AIs themselves run or mostly run companies? The most glaring downside would be the displacement of millions of human workers. Robbed of their livelihoods, where would these folks get the funds to buy the widgets being churned out by robots? The middle class would evaporate, leaving extreme inequality, with the few monstrously rich wielding armies of AIs, and the rest competing for the remaining physical jobs. Not to mention that AI could accelerate advancements in robotics, endangering even manual labor. Might AIs get legal designation as artificial persons, allowing them to own businesses, property, and other assets? Will we see AI politicians? Would an AI-run government be a dystopian nightmare or would it provide an antidote to today’s sprawling bureaucracies?

    The market is an evolutionary environment not unlike the biosphere. It’s a habitat inside the noosphere occupied by firms. Evolution for animals optimizes the fitness function, selecting for species and individuals that can most effectively turn food into offspring:

    the fitness function for humans

    The fitness function for firms is similar. Their resources are capital instead of food, and the optimization selects for firms that can most effectively turn that capital into growth:

    the fitness function for companies

    These two processes exist in the same world. Us humans have hitherto had an integral and symbiotic relationship with firms, since we form them and they give us income. But if firms are increasingly run by artificial agents, the relationship changes. It becomes parasitic, adversarial. Humans and AI firms will compete for finite resources. The nature of evolutionary pressure will necessarily select for the most extractive firms. We see this today even with people-run businesses, but it will accelerate as AI firms proliferate, the pool of consumers dwindles, and antiquated human ethics take a backseat.

    Regulation could mitigate a doomsday scenario, but if deployed too early it would limit useful progress and mostly benefit the largest companies. At first, aligned AIs will act in the creators’ interest, but, over time, selection will reward the least scrupulous. An AI firm that routes some of its profits to lobby for laws in its favor will do better than one that does not. Eventually, AI may develop goals that are alien to us. It may seek to explore the universe or to turn us into paperclips.

    This is an extreme and pessimistic view of where technology is headed. Given super intelligent AIs with full autonomy, maybe it could happen. I don’t really think it will, though. If artificial intelligence is going to doom us, I suspect it will be more mundane than humanity getting outcompeted by a race of smart machines. It will be instead: nefarious actors using large scale deployments of AI agents to foment division, sowing propaganda and FUD, followed by reactive policies that strip rights and engender mistrust; algorithmically-refined ads masquerading as entertainment that soak up our precious time and attention, with none left for boredom, creativity, introspection, or critical thought; convincing AI fakes that deceive us–both willfully, exacerbating the social isolation epidemic, and unwillfully, feeding the $X00B/year scam industry. A common theme of these and similar problems is technology moving faster than our collective ability to adapt to it.

    For now, I’m watching and waiting. And maintaining hope that the upshot could be net positive: technology that works with us and for us, preempts our requests, understands us, and gets out of our way. AI advancements may unlock powerful new tools for thought and enhance human cognition. They could uplift us and deliver a new promise of computing.