How I Test New Material Without Risking a Paying Show

The first time I tested a new effect at a paying corporate event, I knew within thirty seconds that I had made a mistake.

It was a conference in Graz, a product launch event for a technology company. I had been hired to do a thirty-minute keynote with magic woven through it — my usual format. The show was solid. I had tested material that I trusted, pieces that had been through dozens of performances and that I knew would work. But I had also slipped in a new piece, an effect I had been developing for about six weeks, rehearsing in hotel rooms and performing for the mirror in my bathroom. I explored this further in The 'Could a Reporter Describe It in One Sentence?' Test for Every Effect.

The new piece was, in my private rehearsals, excellent. Clean, visual, and tied to a strong thematic point about innovation and unexpected connections. I had scripted the patter. I had drilled the handling. I had visualized the audience reaction. I was confident.

Confidence, in this case, was ignorance wearing a nice suit.

The effect required a setup that made perfect sense to me but that the audience, I discovered in real time, found confusing. I could see it happening — the slight tilting of heads, the exchanged glances between colleagues, the body language that says “I am not quite following what is supposed to be happening here.” By the time I reached the climax, the audience had lost the thread. The reveal landed on a room that was not sure what they were supposed to be reacting to. I got polite applause. The kind that says “we appreciate the effort” rather than “we just saw something impossible.”

I salvaged the rest of the show with my tested material, but that new piece sat in my set like a pothole in a highway. The audience stumbled over it and then had to regain their footing, which cost me energy and momentum that took several minutes to recover.

That night, in the hotel room, I wrote two things in my notebook. First: “Never test raw material at a paying gig again.” Second: “Build a system for testing material before it goes live.”

Why Hotel Room Rehearsal Is Not Enough

Let me be clear about what hotel room rehearsal can and cannot do, because I spent a long time believing it could do more than it actually can.

Hotel rooms have been my practice studio since I first fell down the magic rabbit hole around 2016. Buying online tutorials from ellusionist.com, practicing card sleights on the hotel desk, watching myself in the bathroom mirror, drilling moves until my hands could execute them without conscious thought. Hundreds of nights across Austria and beyond, alone with a deck of cards and a laptop, building the technical foundation of my craft.

Hotel room rehearsal is excellent for technical preparation. It builds muscle memory. It refines handling. It lets you work through the physical sequence of an effect until it becomes automatic, which is the prerequisite for natural, conversational performance.

What hotel room rehearsal cannot do is test the audience experience. You cannot know how an audience will respond to a setup until you deliver that setup to actual humans. You cannot know whether your patter is clear until someone who has never heard it before tries to follow it. You cannot know whether the climax lands with impact until you see faces reacting to it in real time.

Joshua Jay makes this point powerfully in How Magicians Think when he discusses the gap between a magician’s internal experience of an effect and the audience’s external experience. The magician knows what is supposed to happen. The magician can see the structure, the build, the moment of impossibility. But the audience sees none of this internal architecture. They see only what you show them and hear only what you tell them. The gap between your intention and their experience is the gap that only live testing can reveal.

I performed for a mirror for six weeks and the mirror told me the effect was perfect. The mirror, it turns out, is not a useful audience. It already knows the trick.

The Testing Ladder

Over the past year, I have developed what I think of as a testing ladder — a sequence of increasingly demanding testing environments that a new effect must pass through before it earns a place in a paying show.

Rung One: The Single-Person Test. The first time I perform a new effect for someone who is not me, it is for one person. Usually a friend, sometimes a colleague, occasionally a family member. The goal is not to get a reaction. The goal is to get information. Can this person follow the setup? Do they understand what is happening before the climax? When the reveal comes, do they know what they are supposed to be reacting to? I wrote about this in How Puck and Scott Alexander Modernize Classic Effects for New Audiences.

I have learned to be specific about what I am testing. I tell the person beforehand: “I am working on something new. I am not looking for you to tell me it was great. I need you to tell me if there were any moments where you were confused about what was happening.” People are remarkably good at identifying confusion if you give them permission to be honest.

The single-person test has killed more effects in development than any other stage. About half of the pieces I bring to this stage do not survive it, because the confusion that was invisible to me in the hotel room becomes immediately obvious when a real human encounters the effect for the first time.

Rung Two: The Small Group Test. Effects that survive the single-person test move to small groups — three to five people, usually in a casual social setting. A dinner with friends. A gathering at someone’s home. An informal after-work drink.

The small group introduces dynamics that do not exist with a single person. People look at each other. They comment. They ask questions. They react not just to the effect but to each other’s reactions. This social dimension changes the experience in ways I cannot predict, and it reveals things about the effect that a one-on-one test cannot show.

In a small group, I learn whether the effect generates conversation or silence. Whether people lean forward or lean back. Whether they try to reconstruct what happened or simply accept the experience. I learn whether the effect works only for the person directly involved or whether bystanders experience it too. And I learn, crucially, whether the pacing works when real social dynamics are in play — when people interrupt, when someone asks me to repeat something, when the flow is disrupted by the unpredictable behavior of actual human beings in a social setting.

Rung Three: The Low-Stakes Live Performance. Effects that survive both previous tests move to what I think of as low-stakes live performance — situations where I am performing for a real audience but where the consequences of failure are minimal.

For me, these situations include informal corporate gatherings where I am a guest rather than the hired entertainment. Community events. Networking dinners. Situations where I am performing because someone asked me to show them something, not because I am being paid to deliver a professional show.

The low-stakes performance tests something the previous rungs cannot: how the effect functions within the context of a larger set. A new piece might work beautifully in isolation but create problems when placed between two other effects. It might disrupt the pacing. It might require a setup that contradicts something I said during the previous effect. It might require a tonal shift that the audience finds jarring.

I typically insert the new piece into an existing set of proven material, surrounded by effects I trust. This way, if the new piece fails, the surrounding material carries the show and the audience barely notices the stumble. If the new piece succeeds, I know it can hold its own in professional company.

Rung Four: The Paying Gig (Controlled). Only after a piece has survived all three previous rungs do I introduce it into a paying show. And even then, I do it with control. I place the new piece in a low-risk position — not as an opener, not as a closer, not at any structural point where failure would damage the show. I put it in the middle, flanked by strong tested material, in a spot where I can cut it short if something goes wrong and transition smoothly to the next piece.

The first three paying performances with a new piece are, in my mind, still testing. I am watching the audience response with analytical eyes, comparing their reactions to what I expected and noting any discrepancies. If the piece works as intended three times in a row, in three different audiences and three different venues, it graduates to the permanent repertoire.

If it does not, it goes back to the workshop.

What the Testing Ladder Reveals

The most valuable thing about this system is not that it prevents disasters at paying shows — although it does that. The most valuable thing is the information it generates at each stage. This connects to what I found in The Classic Effects That Should Be in Every Show (and Why Magic Snobs Dismiss Them).

Every rung of the ladder reveals something specific. The single-person test reveals confusion in the setup. The small group test reveals pacing and social dynamics problems. The low-stakes performance reveals context and sequencing issues. The controlled paying gig reveals whether the effect can function under professional pressure, with professional lighting, in professional venues, for audiences who have expectations.

No single test reveals all of this. The information is layered, and each layer builds on the previous ones. By the time an effect reaches my permanent repertoire, it has been stress-tested in conditions that progressively approximate the real thing, and the weaknesses that would have been invisible in a hotel room have been identified and addressed.

I think about this the way I think about product development in my consulting work. You do not ship version one to the customer. You prototype, you test with focus groups, you run a beta, you iterate based on feedback, and then you ship. The testing ladder is the beta program for a magic effect.

The Emotional Difficulty of Cutting Material

I want to be honest about something that the systematic framing of this process obscures: it is emotionally difficult to cut material you have invested time in.

When an effect fails the single-person test — when the friend sitting across the table says “I honestly had no idea what was supposed to happen there” — there is a moment of genuine pain. You have spent weeks, sometimes months, developing this piece. You have rehearsed it hundreds of times. You have visualized the audience reaction, the applause, the expressions of wonder. And now someone is telling you it does not work.

The temptation, which I have given in to more than once, is to explain. “Well, the idea is that the card was supposed to — no, wait, let me show you again, because I think I did not set it up clearly enough.” This is the equivalent of a startup founder explaining to a customer why the product is actually good despite the customer’s experience. The customer’s experience is the data. Your explanation is not.

I have learned to treat each test as data collection, not as a performance. When I am testing, I am not performing. I am conducting an experiment. The hypothesis is that this effect will produce a specific audience experience. The test either supports the hypothesis or it does not. If it does not, the effect needs revision. The emotional investment is irrelevant to the data.

This is easier said than felt. But the alternative — performing untested material at professional events and hoping for the best — is worse. The Graz incident taught me that. One mediocre piece in an otherwise strong show does not just weaken that moment. It weakens the audience’s trust in everything that follows. They start watching with a slightly more skeptical eye. They hold back slightly on their reactions. The contract between performer and audience, the implicit agreement that says “I will show you something worth your attention,” has been strained.

I would rather cut twenty good effects during testing than perform one bad one at a paying show. The math is straightforward. The emotions are not. But the math wins.

The Freedom of a Tested Repertoire

Here is what nobody tells you about rigorous material testing: it makes performing more fun.

When every piece in your set has survived the full testing ladder, you walk on stage with a different kind of confidence. Not the false confidence of the hotel room mirror, but the earned confidence of knowing that every single thing you are about to do has been validated by real audiences in real conditions. You know the setup works because you have watched thirty people follow it without confusion. You know the climax lands because you have seen it land a dozen times. You know the pacing is right because you have tested it in small groups and large crowds and everything in between.

This confidence frees you to focus on the things that matter in the moment — connecting with the specific audience in front of you, reading the room, adjusting your energy, being present. You are not worrying about whether the next piece will work. You know it will work. Your job is to make it work as well as it possibly can for these particular people, in this particular room, on this particular night.

That is the gift of systematic testing. Not the elimination of risk — risk can never be entirely eliminated in live performance. But the elimination of unnecessary risk. The assurance that you have done everything in your power to put only your strongest, most audience-tested material in front of the people who are paying to see you perform.

Test everything. Trust nothing that has not been tested. And when something fails the test, be grateful for the information and move on. The next piece might be the one that makes the show.

How I Test New Material Without Risking a Paying Show

You Might Also Enjoy

The 'Could a Reporter Describe It in One Sentence?' Test for Every Effect

How Puck and Scott Alexander Modernize Classic Effects for New Audiences

The Classic Effects That Should Be in Every Show (and Why Magic Snobs Dismiss Them)

The Six Cards I Keep in My Show Case: How Condensed Reference Sheets Work

Felix Lenhard

How I Test New Material Without Risking a Paying Show

You Might Also Enjoy

The &#39;Could a Reporter Describe It in One Sentence?&#39; Test for Every Effect

How Puck and Scott Alexander Modernize Classic Effects for New Audiences

The Classic Effects That Should Be in Every Show (and Why Magic Snobs Dismiss Them)

The Six Cards I Keep in My Show Case: How Condensed Reference Sheets Work

Felix Lenhard

The 'Could a Reporter Describe It in One Sentence?' Test for Every Effect