Agent Skills › QuZhan51496/paper2anything

QuZhan51496/paper2anything

GitHub

将学术论文PDF转换为会议海报。作为指挥者,通过分步脚本解析内容、决策视觉元素、手写HTML并迭代渲染,直至生成高质量海报。

4 skills 142

Install All Skills

npx skills add QuZhan51496/paper2anything --all -g -y
More Options

List skills in collection

npx skills add QuZhan51496/paper2anything --list

Skills in Collection (4)

将学术论文PDF转换为会议海报。作为指挥者,通过分步脚本解析内容、决策视觉元素、手写HTML并迭代渲染,直至生成高质量海报。
用户希望从论文PDF生成会议海报 需要将学术文档转化为可视化展示材料
paper2poster/SKILL.md
npx skills add QuZhan51496/paper2anything --skill paper2poster -g -y
SKILL.md
Frontmatter
{
    "name": "paper2poster",
    "effort": "high",
    "arguments": [
        "pdf-path"
    ],
    "description": "Convert academic papers (PDF) into conference posters (HTML\/PNG). You are the conductor: you decide what each section needs — an original paper figure or text — write the outline, hand-author the poster HTML, and iterate on the render using your own visual read and a blind-reader content quiz. Use when the user wants a poster from a paper PDF.",
    "allowed-tools": [
        "Bash",
        "Read",
        "Write",
        "Glob",
        "Grep",
        "Agent",
        "AskUserQuestion"
    ],
    "user-invocable": true,
    "disable-model-invocation": false
}

Paper2Poster — Conference Poster Skill (You as Conductor)

Convert a paper PDF into an academic conference poster (HTML/PNG) by walking a small set of CLI scripts. You are the conductor: this file is the recipe, not an orchestrator. There is no run_pipeline.py — at each step you run one Bash command, read the intermediate artifact, and ask the user for confirmation at the decision points below.

PDF
  → parse_pdf.py            (MinerU → content.md + figures/)
  → intake QA               (you ask size/venue/authors/visual policy)
  → auto_outline.py         (digest.json + assets[])
  → choose visuals          (you read parsed/figures/ + captions: which sections use an original figure, which use text)
  → outline.json            (you write from content.md; user confirms)
  → poster.html             (you hand-author the poster: original figures where they help, text elsewhere)
  → render + score          (Playwright PNG → deterministic geometry check + your own visual read + blind-reader content quiz)
  → iterate on poster.html  (edit + re-render + re-score until it reads like a real poster)
  → poster.png

problem_context, method_main, and result_evidence are a useful reading-order spine to think about — what's the paper about, how does it work, what's the evidence. For each, decide what carries it best: an original paper figure if one reads well at poster scale, or text (a worded explanation, a labelled box, a short list) if no figure fits. There is no figure quota — use as many or as few original figures as the content calls for, down to zero. A text-only section, or a text-only poster, is a legitimate outcome when the figures don't earn their place.


How you run this skill

This skill only works if you execute it as a sequence of small Bash + Read + AskUserQuestion turns. Do not try to short-circuit it.

  1. Run one step at a time with the Bash tool, exactly as written below. Use absolute paths under ${SKILL_DIR} (the directory this skill lives in — e.g. <…>/paper2anything/paper2poster; set it once per shell with export SKILL_DIR=<…>/paper2anything/paper2poster).
  2. Read the intermediate artifact before moving on:
    • after Step 3: the figures you considered, viewed in parsed/figures/ (and their captions in digest.json), and which sections you decided to carry with text instead,
    • after Step 5: your rendered poster.png, plus your visual read and the blind-reader quiz result.
  3. Pause for the user at the decision points with AskUserQuestion:
    • After Step 2 — intake: size, venue, author block, visual policy.
    • After Step 3 — is your per-section visual plan (which sections use an original figure, which use text) acceptable?
    • After Step 4 — is the outline structure acceptable?
    • After Step 8 — accept the poster, or revise/restyle it?
  4. Let the content decide whether a section gets a figure. Use an original paper figure where one genuinely helps; carry a section with text when no figure earns its place. Don't pad the poster with weak figures to hit a count, and don't strip a figure that's doing real work. A text-only section — or a text-only poster — is fine.
  5. Score every render, then iterate (Steps 5–7). After each render, run the deterministic geometry check and look at the PNG yourself (your visual read — hierarchy/density/balance/readability); run the blind-reader content quiz (Step 7) at milestones rather than on every micro-edit (it spawns a subagent, so it costs more than reading a PNG). Let what they surface drive the next edit; don't ship the first render unscored. The geometry check is two-sided: not just "no overflow" but also a fill ratio ≥ 0.95 — a poster that fits but leaves large whitespace (or shrinks text to do so) fails and must be iterated. Verify this gate yourself; how you reach it is your judgment.
  6. Don't overwrite a good render — keep scored candidates. Iteration is not always monotonic: an edit aimed at one issue can regress overall quality, and the version you had three edits ago may have read better. Before a non-trivial restyle or structural change, save the current render as a numbered candidate (e.g. copy poster.html/poster.png to poster_candN.html/poster_candN.png) and record its scores. Pick the final from the best-scoring candidate, not just the latest edit. Never let a higher-scoring intermediate be silently overwritten by a worse one.
  7. On error, stop and diagnose. Do not silently fall back to a degraded path to "make it run."

Step 0: Environment Check

Unified environment: every python command in this skill runs in the paper2anything package's unified conda environment (created from the top-level environment.yml), each prefixed with conda run -n paper2anything --no-capture-output. The pip install below is only a fallback when the unified environment is missing a dependency; playwright install chromium still needs to be run once on its own.

conda run -n paper2anything --no-capture-output python ${SKILL_DIR}/scripts/check_env.py

If anything is missing:

pip install Pillow requests playwright
playwright install chromium

Credentials (unified): all keys live in the package-root .env (copy from .env.example, gitignored). Export once per shell before running any command below: set -a; source <paper2anything package root>/.env; set +a. This skill needs only MINERU_API_TOKEN (PDF parsing). Everything else — figure choice, design, the visual read (Step 6), and the content check (Step 7) — is done by you and a blind subagent, with no external VLM / LLM API.

Variable Purpose Default
MINERU_API_TOKEN MinerU PDF parsing

Step 1: Parse the PDF

All artifacts land next to the paper in <pdf dir>/.paper2anything/poster/<stem>/ (multiple papers in the same directory are split by <stem> and never overwrite each other). Each step below is an independent Bash call that shares no shell variables with the others, so every command block that needs the run directory recomputes RUN_DIR from $pdf_path right at its top (just like the always-available ${SKILL_DIR} — re-set it every time; never export it once in one step and expect it to survive into later steps). The scripts still live in ${SKILL_DIR}/scripts.

RUN_DIR="$(dirname "$pdf_path")/.paper2anything/poster/$(basename "${pdf_path%.*}")"
mkdir -p "$RUN_DIR"
conda run -n paper2anything --no-capture-output python ${SKILL_DIR}/scripts/parse_pdf.py "$pdf_path" \
  --output-dir "${RUN_DIR}/parsed"

MinerU (cloud) is the only parser — on failure the script exits non-zero. Fix the token / network and re-run.

Produces:

  • parsed/content.md — full text in Markdown
  • parsed/metadata.json — title, authors, affiliations, abstract
  • parsed/mineru_raw.json — typed blocks with bbox + captions (MinerU only)
  • parsed/figures/, parsed/tables/

Step 2: Poster intake — confirm layout-critical choices [INTERACT]

Before designing anything, collect the few choices that actually change the poster. Now that the PDF is parsed you can show the user the parsed title/authors and ask the rest in one short grouped AskUserQuestion (don't turn this into a long form). The full checklist and defaults are in references/poster_intake_qa.md; the five that matter:

  1. Size / aspect — e.g. 48x36 in landscape, 36x24, A0, 16:9 screen. Default: 48x36 in landscape (or 16:9 if the user says demo/slide/screen). This sets the render pixel size in Step 5.
  2. Venue / context — which conference, workshop, or review setting. Tunes density and tone, not hard rules.
  3. Author block — use the parsed authors/affiliations (show them), anonymize (Anonymous Authors, blind review), or custom text. Default: parsed.
  4. Visual policy — use the original paper figures. When an original figure isn't poster-friendly (too cluttered, too small, or no figure fits a section), use text for that section instead — a worded explanation, a text box, or a short list. Default: original figures where they read well, text otherwise; no figure quota.
  5. Output directory — the run/work directory for this whole job: every artifact the workflow produces (parsed/, digest.json, outline.json, poster.html, poster.png, the score JSONs, any candidates) is written here, not just the final poster. Default: ${RUN_DIR}. Ask so the user can redirect the entire run to a folder they choose (e.g. their Desktop or a project dir); if they name one, use it as the --output-dir / --output base for every step below (Step 1 parse, Step 2 digest, Step 4 outline, Step 5 render, Steps 6–7 scores) so nothing lands in the default work dir. Report that path in Step 8.

The output is an HTML/PNG poster (poster.html + poster.png). If the user just says "make a poster" with no answers, state the defaults you're using and proceed — don't block. Record the answers in outline.poster_intake (Step 4) so the design and any critique treat them as hard constraints. The size you settle on here is what Step 5 renders at (e.g. 20x15 in1920x1440 px, 48x36 in2304x1728 at 48 dpi or scale to taste).

RUN_DIR="$(dirname "$pdf_path")/.paper2anything/poster/$(basename "${pdf_path%.*}")"
conda run -n paper2anything --no-capture-output python ${SKILL_DIR}/scripts/auto_outline.py \
  --parsed-dir "${RUN_DIR}/parsed" \
  --output     "${RUN_DIR}/digest.json"

digest.json is ~17× smaller than mineru_raw.json: section-grouped, figures/tables attached to their nearest preceding section, References/Appendix dropped. It also exposes a typed assets[] array (PosterAgent-style) where each entry has type (claim / metric / figure / table), role (problem / method / result / takeaway / contribution / limitation), and priority (1–5). The role/priority tags are raw keyword heuristics — convenience hints, not a ranking to trust. When you write the outline (Step 3) you judge content importance yourself from content.md; don't defer to these scores.

(mineru_raw.json is always produced by the MinerU parse, so auto_outline.py always has its input.)


Step 3: Decide each section's visual — figure or text, YOU choose by eye [INTERACT]

Deciding what carries each section is the same judgment you make when hand-authoring the HTML (does this section need a figure at all; if so, which one dominates, which is wide enough to span full width). So make it yourself, here, by looking at the figures — not with a keyword script.

  1. List the extracted figures. digest.json has a figures[] / tables[] array (each with image_path, caption, section, page); the image files live in parsed/figures/. Read the captions, and Read the actual image files for the plausible candidates — a caption that says "pipeline" can sit over a figure that is useless at poster scale, and only your eyes catch that.

  2. For each part of the reading-order spine, decide figure-or-text:

    • problem_context — frames the task / prior-work limitation / a vivid input example. "What is this paper about" should land here.
    • method_main — how it works: the dominant pipeline / architecture / algorithm.
    • result_evidence — the strongest evidence for the headline claim: a comparison plot, a qualitative grid, an ablation curve, or results numbers.

    For each, use an original figure if one is self-explanatory at a glance, large enough to stay sharp when enlarged, and not awkwardly tall/narrow — otherwise carry that part with text (a worded explanation, a labelled box, or a short list). Don't reuse the same figure twice, don't force a figure where none fits, and don't cap yourself at three — a section outside this spine can take a figure too if it earns one. The spine is a thinking aid, not a quota.

  3. Confirm with the user via AskUserQuestion: lay out your per-section plan (for each: figure id + one-line "why this one", or "text — no good figure"), and ask accept this plan, or swap something? Proceed only when accepted.

You don't need to copy files anywhere — just record each chosen figure's path so you can reference it in outline.json (Step 4) and embed it in the HTML (Step 5).


Step 4: Write the outline [INTERACT]

You read the full parsed paper (parsed/content.md) and write outline.json directly with the Write tool, following the per-section visual plan you set in Step 3 (which sections embed an original figure, which are carried by text). You are the conductor here — selecting and condensing the paper's content into poster form is a judgment task, not a mechanical extraction. Do not just copy digest.json's auto-extracted sections (they are dense source prose); decide yourself what belongs on the poster and how to phrase it.

Goal, not quota. Make a poster that reads like a real conference poster — study the 8 real CVPR/ICLR examples in references/poster_examples/ for how much text, how many sections, and what density real posters use. Let the paper's own shape drive the structure: a method-heavy paper may need a long process section with a big diagram; a results paper may be one line plus a dominant table. There is no fixed section count or bullet count — use what the content and the real-poster aesthetic call for.

The one hard constraint is physical, not stylistic: every bullet and label must fit inside its panel and stay readable at 1–2 m — no overflow, no text shrunk to fit. The geometry check and your visual read in Step 5 measure this; if a panel overflows or is too sparse, that's your signal to cut, tighten, or add — not a reason to keep dense source text. Write each bullet as **Bold lead**: short detail, keep raw numbers inside worded sentences/lists rather than as standalone visual anchors, and for any section you decided gets an original figure (Step 3), reference it in that section's figure field. Sections you decided to carry with text simply have no figure field.

Then use AskUserQuestion to confirm structure with the user before continuing to the render step.

If the user wants an explicitly text-only poster, set outline.poster_intake.visual_policy = "text_only"; this also short-circuits the fallback figure gate. (Choosing text for some sections while using figures in others does not need this flag — it's just your normal per-section judgment from Step 3.)

(Outline JSON schema is below; color palettes too.)


Step 5: Design the poster — YOU hand-author the HTML

You are the poster designer, not a template picker. The best posters in this pipeline are the ones you write yourself: you have seen the paper, you know each section's visual plan (figure or text) and the real pixel dimensions of any figures you chose, and you can study real conference posters. A fixed template cannot make the design judgments a good poster needs — whether a section even wants a figure, which figure dominates, whether a wide figure spans full width, where the claim anchors the eye, how dense each region is. So write the poster's HTML directly and iterate on it by scoring the render. There is no template to select and no repair-op vocabulary to obey.

Design fresh for each paper — do not reuse a house style. A real risk when you've made posters before is silently copying your last one's look (same title band, same color blocking, same grid). Resist it. Let this paper's content, field, and figure shapes drive the layout: a benchmark paper, an RL/method paper, and a systems paper should not look alike. Vary the palette (match the field or the paper's own accent color), the structure (3-column grid vs a left-spine flow vs a hero-on-top), and what dominates. If your new draft looks like your previous poster, that's a signal to rethink, not a shortcut to take.

  1. Study the references. Read 2–3 of the real CVPR/ICLR posters in references/poster_examples/ so your design targets that visual language (strong title band, a dominant hero element, a one-sentence claim, color-blocked sections, generous whitespace, big type). Those examples are wide (~2:1); at a squarer size like the 48×36 default (≈4:3) a full-width hero band or figure eats too much vertical budget and overflows — keep the hero column-scoped (or pick a 2:1 size) so the vertical fits.

  2. Check any figures' real shape. For each section you decided gets an original figure, get its pixel size (e.g. conda run -n paper2anything --no-capture-output python -c "from PIL import Image; print(Image.open('…').size)"). A wide figure (≈2–3:1) must be placed full-column or full-width so it stays sharp — never squeeze a wide figure into a narrow box; that is what makes figures look like thumbnails.

  3. Write poster.html yourself with the Write tool. While iterating, reference figures by relative path (src="parsed/figures/x.jpg") — screenshot.py and geom_check.py resolve them via file://, and the file stays small and editable. Only at the end run collect_figures once — conda run -n paper2anything --no-capture-output python ${SKILL_DIR}/scripts/collect_figures.py "${RUN_DIR}/poster.html" — to copy the referenced figures into a sibling images/ directory and rewrite each src to images/<name>, giving a portable poster (poster.html + images/). Design freely — pick the grid, the type scale, the color blocking, where the claim sits, what dominates. Size the poster to the intake (poster_intake.size, e.g. 20×15 in → 1920×1440 px at 96 dpi). Use the outline you wrote in Step 4 as the content source. Sections you planned as text get a worded treatment (a paragraph, a styled text box, or a short list); sections with a figure embed it.

    Figure CSS — make the border hug the image, never frame empty space. A recurring bug: setting width:100% + max-height:X + object-fit:contain on an <img> paints the border on the full-width box while the image shrinks to fit inside it, leaving large white margins between the border and the actual picture (a small image floating in a big framed box). Avoid it — pick one of two patterns so the border traces the image edge:

    • Fill the column (preferred when the figure's natural height fits): display:block; width:100%; height:auto; + border. The image spans the column and the border hugs it; this also keeps the figure's own labels as large as possible.
      • Cap the height, center it (when full-width would be too tall): display:inline-block; max-height:X; width:auto; height:auto; + border, in a text-align:center wrapper. The border still traces the image; a small side margin is fine — what you must avoid is object-fit:contain on a fixed-width box. Don't combine width:100% with object-fit:contain.

    Never force a fixed height (or fixed width and height) to fill a gap — it distorts the figure. Tempting fix when a panel has leftover whitespace: stretch its image taller with height:640px. Don't. A figure must always scale proportionally — set at most ONE axis (width:100%; height:auto, or max-height:X; width:auto) and let the other follow. A real run set height:640px; width:auto on a 1.64:1 chart; a competing width constraint then pinned the width too, squashing it to 1.02:1 (60% vertical stretch) — a screenshot bug the eye catches instantly. Fill leftover whitespace with content (a takeaway box, an extra bullet) or by rebalancing columns — never by distorting a figure. Verify after every render: renderedWidth/renderedHeight must equal naturalWidth/naturalHeight within ~0.02 for every <img>; flag any mismatch as a distortion bug.

  4. Render, then score it — three checks, every render. Screenshot at the poster's exact pixel size with the standalone screenshot instrument:

    RUN_DIR="$(dirname "$pdf_path")/.paper2anything/poster/$(basename "${pdf_path%.*}")"
    conda run -n paper2anything --no-capture-output python ${SKILL_DIR}/scripts/screenshot.py \
      "${RUN_DIR}/poster.html" \
      "${RUN_DIR}/poster.png" \
      --width 2304 --height 1728
    

    The example uses the default intake size (48×36 in → 2304×1728 at 48 dpi). Set --width/--height to your intake size (e.g. 20×15 in → 1920×1440 at 96 dpi) — they must match the size you recorded in poster_intake, not the example. This tool only screenshots the HTML you wrote — it picks no template and makes no design decision. Then run all three checks (none is optional — they are how you know what to fix next):

    • (a) Deterministic geometry check — two-sided. Overflow, clipping, unequal columns, and underfill are all measurable — measure them, don't eyeball a downscaled PNG. Run the shipped checker for the three structure-agnostic gates (no overflow; fill ratio ≥ 0.95 computed from the true content frontier — real text/image extent, not a fixed-height container; per-image aspect within 0.02). It exits non-zero on failure:

      conda run -n paper2anything --no-capture-output python ${SKILL_DIR}/scripts/geom_check.py \
        "${RUN_DIR}/poster.html" 2304 1728
      

      (pass the same width/height you rendered at — the example is the 48×36 default) It does not know your panel structure — still measure the per-panel voids below yourself (Playwright box model) and read the PNG:

      • No overflow: body.scrollHeight must be <= the canvas height (else content spills off the bottom); each figure's <img> right/bottom must sit inside its panel; columns should end at roughly the same y.
      • No underfill (hard gate): the poster must actually fill the canvas. Compute a fill ratio (content height ÷ canvas height, and per-column / per-panel where it helps); the fill ratio must be ≥ 0.95. A poster that merely "doesn't overflow" is not done — large bottom whitespace, sparse panels, or text shrunk small enough to leave gaps all fail this gate.
      • No trapped internal whitespace (per-panel gate): page-level scrollHeight does NOT catch this — when flexbox stretches panels to equal height, a panel with too little content silently pools a large empty gap at its bottom while the page still looks "full." Measure both the bottom gap AND the gaps between a panel's children: for each panel, panel.bottom − lastChild.bottom (bottom void) and max(child[j].top − child[j−1].bottom) (inter-element void); flag either if it exceeds ~60px. Measuring only the bottom gap has a blind spot: justify-content:space-between (and similar) makes the bottom gap read ~0 while shoving the same whitespace between the figure and the text — a real run looked "fixed" by the bottom test yet had a 347px hole between a figure and its caption. Fill a void with real content, not by spacing things apart. The right fixes: add genuine paper content (one or two more bullets — papers usually have more findings than one panel shows), enlarge a figure to fill the column (proportionally — never a forced height), bump the body type scale (also helps legibility), or rebalance which sections share a column. The wrong fix is space-between / margin:auto / giant gaps, which just relocate the void. (A real run that filled the voids with real content read markedly cleaner than the same poster "fixed" with space-between, which only moved the whitespace around.)
      • No block overlap (manual — the geometry gate does NOT catch this). A geom_check PASS does not prove blocks don't overlap: a flex:1 column whose content exceeds its shrunk height escapes downward (overflow is visible by default) and can paint over a following full-width band, yet the box and the content both fit the canvas so overflow/fill/clip all pass. While you measure the per-panel boxes, also confirm each column's content bottom sits above the next full-width band's top — never trust the gate alone for overlap.
      • No distorted figures (aspect-ratio gate): for every <img>, the rendered width/height must match naturalWidth/naturalHeight within ~0.02. A figure stretched to fill space (e.g. a forced height:) is an obvious eyesore you and any viewer catch instantly. (A real run squashed a 1.64:1 chart to 1.02:1.) If flagged, restore proportional scaling — see the figure-CSS rule in step 3, and fill the freed space with content, not a stretched image.
      • No clipped panels (panel-clip gate): the checker also flags a panel that hides its own overflow — overflow:hidden/auto on a flex equal-height column whose content is taller than the box silently cuts the bottom off, and page-level scrollHeight won't catch it. If clipped_panels fires, don't mask it with overflow:hidden — drop the hidden so the box can grow, or cut the content until it genuinely fits.

      This check is two-sided on purpose: a single "did it overflow?" test has only a ceiling and silently passes an under-filled, shrunk-down poster. Do not stop at "fits." After every edit, re-measure and confirm 0.95 ≤ fill ≤ 1.0 with no overflow — this is a pass/fail gate you verify yourself, not a suggestion, and how you reach it (what to resize, cut, reflow, or enlarge) is your judgment. Your eyes on a shrunk full-poster PNG can mis-read a full-bleed figure as "clipped" and miss both real bottom overflow and dead whitespace — trust the pixel math over your eyes for anything geometric.

    • (b) Your own visual read — required, every render. Read the rendered poster.png yourself and judge hierarchy, density, balance, and readability — the subjective read the geometry check can't give you. This is the standing "eyes" of the loop. Caveat: some harnesses/proxies strip image blocks, so Read returns empty for a valid image — sanity-check at run start by Read-ing one small known PNG. If it comes back empty you have no eyes here: lean on the deterministic geometry check (a), keep the design conservative, and tell the user the visual read was unavailable. Never fake a visual judgment on a PNG you couldn't actually see.

    • (c) Blind-reader content check — at milestones (Step 7). A good-looking poster can still fail to convey the paper. You write questions from the paper, then spawn a blind subagent given only the rendered poster.png (not the paper) to answer them — which roles it gets wrong are the roles not landing. Run this at milestones (it spawns an agent), not on every micro-edit.

  5. Iterate until it reads like a real poster AND scores well. Let the three checks drive each edit: the geometry check catches overflow/clipping/imbalance and underfill; your visual read catches weak hierarchy, cramped or sparse panels, a bare number used instead of a sentence, poor balance; the blind-reader quiz catches content that isn't getting through. When a check flags something, edit the HTML and re-render — shrink/cut overflowing text, enlarge a figure that reads as a thumbnail, fill or merge an empty panel with text, rewrite a number into a claim sentence, or move/replace a figure. Re-run the three checks after each edit. Before a big restyle or structural change, snapshot the current render as a numbered candidate (poster_candN.html + poster_candN.png) with its scores, so a regression doesn't destroy a version that read better — iteration isn't always monotonic, and you pick the final from the best-scoring candidate, not the latest edit. Repeat until the geometry check passes (no overflow and fill ratio ≥ 0.95), your visual read is clean, and the blind-reader quiz shows the key roles land. Don't stop the moment content stops overflowing — that only clears the ceiling; verify the fill gate too, or you ship a shrunk-down poster full of whitespace. This is open visual iteration — your judgment guided by the scores, not a fixed op set.

Hard constraint (the only one): every piece of text and every figure label must be fully visible inside its panel and readable at 1–2 m. No overflow, no clipping, no text shrunk to illegibility. If content does not fit, cut or condense it (back in the outline) — never let it spill or shrink to dust.

The final poster is ${RUN_DIR}/poster.png (+ poster.html for editing).


Step 6: Visual read — your own eyes, every render

This is the "eyes" of the iteration loop (Step 5, check b). After each render, Read ${RUN_DIR}/poster.png yourself and judge it as a poster: visual hierarchy (does the title / claim / hero dominate?), density (any panel cramped or too sparse?), balance (columns even? whitespace intentional?), and readability at 1–2 m. Note the top 2–3 issues and let them drive the next edit — exactly the subjective read the deterministic geometry check can't give you. No external model needed — this is your own judgment on the rendered PNG.


Step 7: Blind-reader content check — you orchestrate

A good-looking poster can still fail to convey the paper. This is the PaperQuiz idea (PosterAgent metric, arXiv:2505.21497): its whole value is that the answerer is blind — it sees only the poster, never the paper — so a wrong answer means the poster didn't carry that content. You authored the poster and have the paper in context, so you cannot answer blind yourself (you'd score inflated). Keep the independence by splitting the roles:

  1. You write the questions (you know the paper). Draft 5–8 multiple-choice questions across problem / method / result / takeaway (+ optional contribution / limitation), with distractors pulled from sibling sections so wrong options stay plausible. Keep the correct answers to yourself.
  2. A blind subagent answers them. Spawn one subagent (the Agent tool) whose context contains only the rendered poster.png and the questions — not the paper, digest, or outline. Ask it to answer each (single letter A/B/C/D + a one-line "where on the poster I saw it", or ? if absent). Because it has only the poster, its answers measure what the poster actually communicates.
  3. You score by role. Compare its answers to your key; any role it misses (miss rate ≥ 0.5) is content that isn't landing — make that role bigger, clearer, or add the missing fact, then re-render.

Run this at milestones (after the poster reads cleanly, and before user preview), not on every micro-edit — each round spawns an agent. Use it to: drive a repair round on a failing role; decide if it's good enough to ship (all key roles answered correctly); and re-rank candidates when you tried more than one layout.

You fix content fidelity by editing the outline / poster.html directly.


Step 8: Preview & iterate with the user [INTERACT]

Once your own iteration (Steps 5–7) has the poster reading cleanly and scoring well, show it to the user:

  1. Get the current design into your contextRead ${RUN_DIR}/poster.png. If image read is unavailable in your harness, rely on the deterministic geometry check (Step 5a) and say so when you present.
  2. Briefly describe the design choices you made (layout, what dominates, claim, which sections use a figure vs text).
  3. Use AskUserQuestion to offer:
    • Accept — deliver ${RUN_DIR}/poster.png as final.
    • Revise — the user points at something; you edit poster.html directly and re-render. Same scored iteration as Steps 5–7 — re-run the three checks after the edit. No fixed repair vocabulary.
    • Restyle — try a different visual direction (different grid, color, or what dominates, or swapping a figure for text / text for a figure) by editing poster.html.

When approved, report the final poster.png from the run directory (outline.poster_intake.output_dir, e.g. ${RUN_DIR}/poster.png by default, or the folder the user chose in Step 2 — where every artifact for this run already lives).


Step 9: Collect the deliverable next to the PDF

By default the deliverable is buried in .paper2anything/poster/<stem>/ and hard to find. Once finalized, copy it to a <stem>_poster/ directory alongside the PDF (the copy inside .paper2anything stays untouched) so the user can open it right next to the paper:

pdf_path="/path/to/paper.pdf"
RUN_DIR="$(dirname "$pdf_path")/.paper2anything/poster/$(basename "${pdf_path%.*}")"   # if Step 2 changed output_dir, use your actual run directory
DEST="${pdf_path%.*}_poster"          # same directory as the PDF, same name + _poster suffix
i=2; while [ -e "$DEST" ]; do DEST="${pdf_path%.*}_poster_v$i"; i=$((i+1)); done   # on name collision, append _v2, _v3
mkdir -p "$DEST"
cp "$RUN_DIR/poster.png" "$RUN_DIR/poster.html" "$DEST/"
[ -d "$RUN_DIR/images" ] && cp -r "$RUN_DIR/images" "$DEST/"   # figures are referenced as images/<name>, so bring them along

Put poster.png, poster.html, and images/ (poster.html references figures as images/<name>) into the <stem>_poster/ subdirectory; <stem>_poster/poster.png is the final poster.


Outline JSON Format

{
  "title": "Paper Title",
  "authors": "Author1, Author2",
  "affiliations": "University of X",
  "contact": "email@example.com",
  "poster_intake": {
    "size": "20x15 in landscape",
    "venue": "AAAI poster session",
    "author_policy": "parsed",
    "output_target": "html_png",
    "output_dir": "<pdf dir>/.paper2anything/poster/<stem>",
    "visual_policy": "original_figures_or_text"
  },
  "color_scheme": {
    "primary": "#1B3A5C", "secondary": "#2E86AB",
    "accent": "#A3D5FF", "background": "#FFFFFF", "text": "#1A1A2E"
  },
  "sections": [
    {
      "title": "Method", "column": "middle",
      "content": [
        "**Key Idea**: one-sentence summary",
        "Step 1: …", "Step 2: …"
      ],
      "figure": "figures/fig1.png"
    }
  ]
}

Suggested starting palettes (pick whatever the design calls for — color_scheme is free): CS/AI blue (#1B3A5C/#2E86AB), Bio/Med green (#2D6A4F/#52B788), Physics/Math purple (#5A189A/#9D4EDD), Engineering orange (#E76F51/#F4A261). More in references/color_palettes.md.


Troubleshooting

  • MinerU 401: token missing — set MINERU_API_TOKEN from https://mineru.net/apiManage/token.
  • MinerU OSS download stalls: requests through MinerU's presigned OSS URLs must bypass the system proxy — parse_pdf.py already does this with a trust_env=False session (setting proxies={...} alone doesn't work because requests still honors ALL_PROXY).
  • Playwright missing: pip install playwright && playwright install chromium. The geometry check (Step 5, check a) and screenshot.py both need it.
  • No external VLM / LLM: this skill calls no vision/LLM API. Figure choice, design, the visual read (Step 6), and the content check (Step 7, blind subagent) are all done by you; the geometry check (Step 5a) is pure pixel math. The only network dependency is MinerU for PDF parsing (Step 1). For a fully offline run, parse the PDF elsewhere and drop content.md + mineru_raw.json + figures/ into parsed/ by hand.

For layout principles see references/layout_guide.md and references/poster_design_guide.md. For agent-extracted design rules see references/agent_design_rules_from_posters.md.

将学术论文PDF转换为PPT演示文稿的技能。通过解析PDF、生成大纲、设计规格并渲染,实现端到端的自动化流程。适用于用户要求从论文或PDF制作幻灯片、PPT或演示文档的场景。
make slides from a paper generate a deck from this PDF make a PPT from this paper generate slides from a PDF document make a deck from a research paper deck this paper summarize as slides
paper2slides/SKILL.md
npx skills add QuZhan51496/paper2anything --skill paper2slides -g -y
SKILL.md
Frontmatter
{
    "name": "paper2slides",
    "description": "Turn an academic paper PDF into a presentation deck (.pptx) end-to-end. Use this skill whenever the user wants to \"make slides from a paper\", \"generate a deck from this PDF\", \"make a PPT from this paper\", \"generate slides from a PDF document\", \"make a deck from a research paper\", or supplies a research paper PDF and asks for a .pptx out. Trigger even when the user only says \"deck this paper\" or \"summarize as slides\". This is the dedicated, self-contained skill for academic-paper-to-deck flows.",
    "allowed-tools": "Bash, Read, Write, Glob, Grep, Agent, AskUserQuestion"
}

paper2slides

Turn an academic paper PDF into a presentation-ready .pptx. This is a conductor skill: you make the editorial and design judgments, while the mechanical steps are handled by this skill's own self-contained scripts — PDF parsing via the MinerU cloud API (scripts/parse_pdf.py) and .pptx rendering via a PptxGenJS bridge (scripts/render_pptx.py). It orchestrates a "paper → outline → spec → render → QA" pipeline and depends on no other skill.

Quick Reference

Stage Input Output Owner
0.5. configure user dialogue <workdir>/config.json you (AskUserQuestion to confirm three items: length tier + whether to run visual QA + color scheme)
1. extract paper.pdf paper_meta.json + figures_index.json + figures/ + pages/ + equations + hi-res figure/table crops (produced in one shot by the MinerU cloud API) scripts/parse_pdf.py
2. outline paper_meta.json slide_outline.json you (per references/outline-heuristics.md)
3. spec slide_outline.json + figures slide_spec.json you (per references/design-style.md)
4. render slide_spec.json output.pptx scripts/render_pptx.py (PptxGenJS bridge)
5. qa output.pptx pass / fail + fix list content QA always runs; visual QA is gated by Stage 0.5's config.json/visual_qa

See references/pipeline.md for the detailed per-stage protocol.

Invocation Contract

Invocation form:

/paper2slides <paper.pdf> [output.pptx] [--from-stage <name>] [--force]
  • output.pptx omitted → <paper-dir>/<paper-stem>_slides/<paper-stem>.pptx; on a name collision the directory auto-appends _v2 _v3
  • Intermediate artifacts land in <paper-dir>/.paper2anything/slides/<paper-stem>/; when the paper directory is read-only, fall back to ~/.cache/paper2anything/slides/

Python environment: all scripts run in the paper2anything conda environment. Command prefix conda run -n paper2anything --no-capture-output python -m scripts.<name> ... (the prefix can be omitted once conda activate paper2anything is in effect). Every -m scripts.<name> must be run from this skill's directory (the parent of scripts/), otherwise you get No module named 'scripts'; the prefix-omission convention is in references/pipeline.md §General Conventions.

The first step is always to resolve the workspace:

conda run -n paper2anything --no-capture-output python -m scripts.workdir resolve \
    <paper.pdf> [--output <out.pptx>] --ensure

The returned JSON contains all the named paths (paper_meta_path, slide_outline_path, slide_spec_path, figures_dir, ...) and each stage's completion status. Every later stage references the paths in this JSON — do not assemble paths yourself; the rules are centralized in scripts/workdir.py.

Re-run semantics:

Flag Meaning
default already-completed stages are skipped (judged by whether the output files exist)
--force ignore all markers and run everything
--from-stage <name> re-run starting from the named stage (the entry point for the long-lived "interactive mode")

<name>{configure, extract, outline, spec, render, qa}. To go through the three pre-flight questions again, use --from-stage configure (overwrites the old config.json).

Pipeline

Execute in the order below. Each stage has a "completion test" — once its output file appears the stage counts as done, and is auto-skipped on re-run. Each stage in this section writes only three things: (a) the minimal action to do (including the command to type), (b) the single most error-prone pitfall, and (c) a pointer to the corresponding Stage in references/pipeline.md. The full protocol, prerequisites, common errors, and edge cases all live in pipeline.md; this section does not restate them.

Stage 0.5 — Configure (you, AskUserQuestion to confirm three items)

After Stage 0 resolves the workspace and before Stage 1, use [AskUserQuestion] to confirm three items with the user, and Write the answers to the Stage 0 JSON's config_path (<workdir>/config.json):

  1. deck_length: concise / standard / detailed / auto (no page-count target, recommended)
  2. visual_qa: true (default, adds the soffice→jpg→subagent visual loop) / false (run only the cheap content QA)
  3. color_scheme: auto (default, Stage 3 matches a palette to the paper's character) / custom (the user describes a preference in one line, stored in config and parsed by Stage 3)
  • Reuse = skip: when config.json already exists and neither --from-stage configure nor --force is given, don't ask again — reuse the last configuration.
  • When the user's initial request already states a preference, set the corresponding item as the AskUserQuestion default (still show it for confirmation).
  • The full option table, the deck_length page-count band mapping, and the prefill / downstream-consumption details are in references/pipeline.md §Stage 0.5; the config.json schema is in references/schemas.md.

Stage 1 — Extract (script)

Use the MinerU cloud API (requires MINERU_API_TOKEN, configured uniformly in the paper2anything package-root .env; with no token it errors out immediately):

set -a; source <paper2anything package root>/.env; set +a   # export the unified .env (includes MINERU_API_TOKEN)
conda run -n paper2anything --no-capture-output python -m scripts.parse_pdf <paper.pdf>

Produces paper_meta.json + figures_index.json + pages/ + hi-res crops in one shot (the structured metadata comes directly from MinerU).

For --dpi tuning (default 300) and the known imperfections of MinerU parsing (e.g. the bbox sometimes pins the y start onto a subfigure caption), see references/pipeline.md §Stage 1.

Pitfall: before entering Stage 2 you must run the 4 checks at the end of references/schemas.md (title/authors/same-kind merge/missing key kind) to verify paper_meta.json; the check results are not written back to paper_meta.json, but are reflected directly in the Stage 2 outline.

Stage 2 — Outline (you)

Input paper_meta.json + figures_index.json + config.json → output slide_outline.json (schema in references/schemas.md). Per references/outline-heuristics.md, set roles and order, and write each slide's title/bullets/figure_ref/speaker_notes. Pitfall: read config.json/deck_length first — auto does not constrain the slide count; a non-auto value is a soft target for outline granularity, and you must not cut core narrative roles just to hit a number (details in outline-heuristics.md).

After writing, validate that the JSON is well-formed with Python:

conda run -n paper2anything --no-capture-output python -c \
    "import json,sys; json.load(open(sys.argv[1])); print('ok')" \
    <workdir>/slide_outline.json

For the full protocol and common errors, see references/pipeline.md §Stage 2.

Stage 3 — Spec (you)

Input slide_outline.json + figures_index.json + figures/ + pages/ → output slide_spec.json. Per references/design-style.md, choose palette/fonts/layout_kind (avoiding consecutive repeats), and translate the content into elements. Pitfall: every number and term must have a source in paper_meta / figures_index — do not fabricate.

When you need an icon use kind:"icon"; for naming see the "Icons" section of references/pptxgenjs.md, and for the schema see the icon element in references/schemas.md.

When you need to crop a region from a full paper page:

conda run -n paper2anything --no-capture-output python -m scripts.page_screenshot \
    <workdir> <page> <x> <y> <w> <h>

bbox uses relative ratios 0..1; fill the output relative path into the corresponding image element's path. Hard gate for cropping: the first call's bbox must be value-for-value equal to figures_index.json/captions[i].bbox, and you may not eyeball the full page before cropping the first version — for the full baseline and the QA re-crop loop see references/design-style.md §3 (skipping the first cut and going straight to eyeballing = violating §3). The Stage 3 key constraints (coordinates ≤ canvas / margin:0 etc.) are in references/pipeline.md §Stage 3.

Stage 4 — Render (script)

conda run -n paper2anything --no-capture-output python -m scripts.render_pptx \
    <workdir>/slide_spec.json <workdir>/output.pptx

The output is <workdir>/output.pptx; after a successful render, copy it to the final output_path given by Stage 0 (first mkdir -p its parent directory <paper-stem>_slides/). Pitfall: it depends on node+pptxgenjs (installed globally); if node is not on PATH the script errors. On failure, first --dry-run to generate only render/build.js and locate the spec problem. For prerequisites and common errors see references/pipeline.md §Stage 4.

Stage 5 — QA

Execute per the QA section below: content QA always runs, and visual QA is gated by config.json/visual_qa.

Defaults & Errors

Default behavior

Input Default
output.pptx omitted <paper-dir>/<paper-stem>_slides/ (containing <paper-stem>.pptx); on a name collision the directory appends _v2, _v3
work directory not writable in the paper directory fall back to ~/.cache/paper2anything/slides/<paper-stem>-<hash12>/
re-run on the same paper already-completed stages (output files already exist) are auto-skipped
config.json already exists Stage 0.5 skips the questions and reuses the last configuration; to change config use --from-stage configure
Stage 0.5 not asked / visual_qa defaulted deck_length=auto, visual_qa=true (run visual QA)
--from-stage <N> force a re-run from the named stage, without checking outputs
--force force a re-run of all stages (rare, only on a schema upgrade)

Error-Recovery Quick Reference

Only the few classes that need your judgment / routing are listed (all technical recovery is in pipeline.md):

Symptom Handling
Stage 1 section count < 5 or > 15 you manually add / merge during the Stage 2 check
Stage 5 reports "card bottom half empty / unbalanced columns / bottom whitespace" not soft — fix per the three-lever model in references/design-style.md "Principles for Fixing QA Issues" (adjust text amount > adjust bullet spacing > adjust image size, stackable), then --from-stage render to re-run
user/QA reports "leader markers not aligned with text" per references/design-style.md "Visual-Richness Recommendations" item A, batch-reset icon_y to the alignment formula + a final self-check, then --from-stage render to re-run
skill triggered but the user only wants to "read the PDF" mis-trigger — tell the user this skill builds a slide deck and ask whether to proceed; do not continue with paper2slides if they only want to read the PDF

For the full error recovery (Stage 1/4 technical, annotation green box, dpi, figure_ref, etc.) see the "Error-Recovery Quick Reference" section of references/pipeline.md.

QA

Apply the QA loop in references/qa.md (the self-contained "Verification Loop" — content QA

  • visual subagent review). Read config.json/visual_qa first; the key points:
  • content QA always runs: markitdown checks placeholders / number consistency / bullets not lifted from the abstract / no leftover placeholder in the title.
  • visual QA runs only when config.json/visual_qa == true: soffice→pdf→jpg→dispatch a single subagent to batch-review.
  • After fixing issues, edit slide_spec.json then --from-stage render to fully re-render before QA; the recheck rounds narrow per the Verification Loop — round 1 covers the full deck, from round 2 on look only at the pages flagged last round ∪ the pages changed this round, with a final full pass over the whole deck before convergence.
  • The final report must state whether visual QA was run; when skipped, note "visual QA was skipped per config; to change config use --from-stage configure then --from-stage qa".
  • Whitespace / unbalanced columns / misaligned leader markers are not soft, they are hard issues that must be fixed — this is where this skill most often misjudges; don't wave them off as "soft" when reviewing the subagent's report.

For the verification loop and the base visual-subagent prompt template, see references/qa.md; for the full A/B protocol and the qa_log.json structure, see references/pipeline.md §Stage 5; for the visual subagent prompt's two added paragraphs, the three-lever fix model, and the recheck-narrowing details, see the "Visual Subagent Prompt for QA" + "Principles for Fixing QA Issues" sections of references/design-style.md.

Where to Look When Stuck

This skill is self-contained — everything you need is under references/ in this directory:

Confusion Go here
MinerU parsing anomalies (missing figure / garbled text / misaligned table / lost formula) references/pipeline.md §Stage 1 + scripts/lib/mineru_client.py (token, model_version, zip-download retry)
PPT visual design, color palettes, layout choices, taboos the "Design fundamentals" + "Avoid" sections of references/design-style.md
PptxGenJS API usage, gotchas, icon generation references/pptxgenjs.md (includes the academic icon-name table)
QA flow and the visual-subagent prompt references/qa.md
Detailed JSON schema fields for each stage's output (including config.json) references/schemas.md
How sections map to slide roles references/outline-heuristics.md
Matching color/layout to the paper's scenario references/design-style.md
将学术论文PDF转化为微信公众号深度解读长文。Agent主导理解与撰写,调用脚本解析PDF、生成封面及发布草稿。支持交互式确认切入角度与内容,目标读者为AI/ML领域研究者。
论文转公众号 paper2wechat 把论文写成公众号文章 论文转微信推文 PDF 转公众号
paper2wechat/SKILL.md
npx skills add QuZhan51496/paper2anything --skill paper2wechat -g -y
SKILL.md
Frontmatter
{
    "name": "paper2wechat",
    "description": "把学术论文 PDF 转成微信公众号深度解读推文(长文 + 配图 + 封面)。你主导设计的协调式:机械活(MinerU 解析 PDF、生成封面、md2wechat 发布草稿箱)交给 scripts\/ 下的小工具,论文理解、文章结构、长文撰写由你亲自完成并在关键点与用户确认。当用户说“论文转公众号”、“paper2wechat”、“把论文写成公众号文章”、“论文转微信推文”、“PDF 转公众号”时触发。",
    "allowed-tools": "Bash, Read, Write, Glob, Grep, AskUserQuestion, SendUserFile"
}

paper2wechat — 论文转公众号深度解读(你主导的协调式)

把一篇论文 PDF 写成学术深度解读型公众号长文。你是主笔:这份文件是配方, 不是全自动脚本——没有 main.py。机械步骤(解析 / 封面 / 排版)调用 scripts/ 下的小工具; 论文理解、文章结构、长文撰写由你亲自完成,并在关键点用 AskUserQuestion 与用户确认。

目标读者:有 AI/ML 背景的研究者、工程师、学生——读得懂方法细节、关心贡献与局限。

PDF
 → 解析            (parse_pdf.py:MinerU → parsed/ + figures/,含表格)
 → 你读懂论文       (读 parsed/ + 看 figures/) → understanding/paper_understanding.json   [确认切入角度]
 → 你写深度解读长文  (结构自由、配图、忠实准确) → wechat_article.md + .json          [确认]
 → 封面            (cover.py:默认 API 生图 gpt-image-2 横版 900×383;无 key/key 不可用回退本地合成复用原图)
 → 发布草稿箱       (publish.py:md2wechat 直推公众号草稿箱;无凭据/失败→本地样式化 HTML)
 → 公众号推文

运行方式

  1. 一步步来:机械步骤用 Bash 调脚本,创作步骤你自己用 Read / Write 做。
  2. 每个 Bash 块开头就地算 WORKDIR(各 Bash 调用是独立 shell、不共享变量):
    WORKDIR="$(dirname "$pdf_path")/.paper2anything/wechat/$(basename "${pdf_path%.*}")"
    
    $pdf_path 是用户给的论文 PDF(每块重设一次)。脚本在 ${SKILL_DIR}/scripts——SKILL_DIR本 skill 的目录(见本 skill 顶部注入的 "Base directory for this skill: …");各 Bash 块独立 shell, 用到它的块开头按需 export SKILL_DIR=<那个目录> 一次(和 WORKDIR 一样每块现设)。
  3. 两个决策点用 AskUserQuestion 暂停:① 读懂论文后确认“切入角度/深度/篇幅”;② 长文成稿后确认。
  4. 深度解读 = 读懂后用自己的话讲清楚:可以加直觉解释、类比、背景、应用与局限,让有背景的读者快速吃透这篇论文——但忠实于论文、不夸大、不编造数据

Step 0:环境与凭据

统一环境:所有 python 命令都在 paper2anything 的统一 conda 环境(顶层 environment.yml),以 conda run -n paper2anything --no-capture-output 为前缀。md2wechat 已含在该环境中。

凭据集中在包根 .env(从 .env.example 复制,已 gitignore),每个新 shell 先导出一次:

set -a; source <paper2anything 包根>/.env; set +a

本 skill 用到的 key(理解与撰文由你亲自做,不调用任何 LLM API):

  • MINERU_API_TOKEN — 解析 PDF(必填)
  • OPENAI_API_KEY(+ OPENAI_BASE_URL) — 封面默认走它生图(gpt-image-2);无 key 或 key 不可用时回退本地合成(复用论文原图)
  • WECHAT_APPID / WECHAT_APP_SECRET — 直推公众号草稿箱用(md2wechat 调官方 API;获取见「排错」);留空则降级为本地生成样式化 HTML 供手动粘贴
  • MD2WECHAT_THEME — 排版样式(默认 default→学术灰,另有 tech/festival/announcement

依赖自检(缺啥按提示装;依赖统一在 environment.yml):

conda run -n paper2anything --no-capture-output python -c "import requests, rich, dotenv" 2>&1
md2wechat --help >/dev/null 2>&1 && echo "md2wechat 就绪" || echo "md2wechat 未就绪(可后置;缺它 Step 5 会降级为本地样式化 HTML 供手动粘贴)"

Step 1:解析 PDF(脚本)

pdf_path="/path/to/paper.pdf"          # ← 用户的论文 PDF
WORKDIR="$(dirname "$pdf_path")/.paper2anything/wechat/$(basename "${pdf_path%.*}")"
conda run -n paper2anything --no-capture-output \
  python "${SKILL_DIR}/scripts/parse_pdf.py" "$pdf_path" --workdir "$WORKDIR"

产出($WORKDIR 下):parsed/paper_meta.jsonparsed/sections.jsonparsed/figures_index.jsonparsed/tables_index.json[{table_id, caption, html, image_path, page}])、parsed/references.json,以及 figures/*(含表格图)。

解析完,Read parsed/sections.jsonparsed/paper_meta.json 通读全文。


Step 2:读懂论文 → 写 understanding(你来做)[确认]

深度解读的地基,你自己做判断

  1. Read parsed/sections.json(全文)+ paper_meta.jsonRead figures_index.json / tables_index.json 的图注表注(个别 caption 可能为空,以实际看图为准),并实际 Read 关键图figures/ 下)判断哪些值得内嵌、哪张适合做横版封面。
  2. Writeunderstanding/paper_understanding.json
    {
      "paper_title": "...", "method_name": "方法简称",
      "one_sentence_summary": "一句话讲清贡献",
      "problem": "背景与要解决的问题", "method": "核心方法(技术要点,用文字不用公式)",
      "method_intuition": "直觉性解释/类比,帮读者吃透",
      "contributions": ["贡献1", "贡献2"],
      "comparison": "与主要 baseline 的关键差异",
      "experiment_results": ["关键数据(含具体数字)", "..."],
      "limitations": "论文承认的局限或潜在不足",
      "keywords": ["关键词", "..."],
      "cover_palette": {"bg": "#F4F5F7", "accent": "#2E86AB"},
      "important_figures": [
        {"figure_id": "fig_1", "image_path": "<figures_index.json 里的真实路径>",
         "suitable_for_cover": true, "importance_score": 0.9,
         "wechat_caption": "图1:……(≤50字中文图注)", "description": "图说明"}
      ]
    }
    
    • important_figures 必须含 image_path(取自 figures_index.json,真实存在)、suitable_for_coverimportance_score——封面默认走 API 生图(gpt-image-2),仅当 OPENAI_API_KEY 未配/不可用时回退本地合成、靠它选横版原图;漏了则回退时无图 → 封面 skipped
    • cover_palette(可选):本地合成回退路径的配色,按论文领域选 bg(浅色打底) + accent(强调色),标题字色随底色深浅自动适配。参考浅色调:通用 #F4F5F7+#2E86AB、生物 #EEF6F0+#2D8A5F、物理数学 #F1ECF8+#6A30C2、工程 #FBF0EC+#D85A3C、社科 #F4EEF2+#8A5A78、化学 #EAF4F8+#0E86C0
  3. AskUserQuestion 与用户确认切入角度 / 深度 / 目标篇幅(如:偏方法细节还是偏直觉科普、约 1500 还是 2500 字)。

Step 3:写深度解读长文(你来做)[确认]

按公众号深度解读风格亲自撰写,用 Writewechat_article.mdwechat_article.json

公众号深度解读规则(领域知识):

  • 篇幅约 1500–2500 字(按论文复杂度和 Step 2 的约定增减)。
  • 结构自由、随论文走——不强求固定四节。一个好用的骨架:
    1. 导语:这篇为什么值得读(1 段,抛出问题或亮点钩子)
    2. 背景与问题:现有方法的不足
    3. 核心方法:讲清思路,配框架图,可用类比/直觉解释
    4. 关键实验与结果:摆具体数字,配结果图/表
    5. 意义、应用与局限:能用在哪、有什么不足
    6. 结尾:一句话总结 + 延伸思考
  • 用 H2(## 小节标题)分节;关键技术术语首次出现给中英文、可 **加粗**
  • 配图:在合适位置插 ![图注](figures/<图片名>)(md 与 figures/ 同在工作区根 .paper2anything/wechat/<stem>/ 下,故用 figures/...<图片名> 直接取自 figures_index.jsonimage_path 文件名、含其真实扩展名(高清重裁的图为 .png、回退复用抽出图为 .jpg,以 image_path 实际为准),勿臆改后缀)。
  • 忠实准确:实验数字照实引用,不夸大、不编造;可有解读和洞察,但区分“论文说的”与“你的点评”。

产物 —— wechat_article.md:第一行 # {标题},然后正文(含配图)。 wechat_article.json(供排版脚本读 title/digest/word_count):

{"title": "最终标题", "digest": "公众号摘要,≤120字", "word_count": 2200}

写完用 AskUserQuestion 给用户看标题 + 摘要 + 小节结构,确认或按反馈修改(可直接改 .md/.json)。


Step 4:生成封面(脚本,可选)

封面主标题此刻由你现拟(你已读透论文,比从 JSON 里捡更贴切),经 --title 传入:

pdf_path="/path/to/paper.pdf"
WORKDIR="$(dirname "$pdf_path")/.paper2anything/wechat/$(basename "${pdf_path%.*}")"
conda run -n paper2anything --no-capture-output \
  python "${SKILL_DIR}/scripts/cover.py" --workdir "$WORKDIR" \
  --title "你拟的封面主标题"

横版 900×383 JPG:默认用 OPENAI_IMAGE_MODEL(默认 gpt-image-2)AI 生成横版图再裁剪,主标题用你传入的 --title(留空才回退文章标题/method_name);未配 OPENAI_API_KEY 或 key 不可用时回退本地合成——把 understanding.important_figuressuitable_for_cover 最高分的论文原图裁成封面(叠加 --title,配色取 cover_palette);两者都不可用则 skipped。产出 cover.jpg


Step 5:发布到公众号草稿箱(脚本 + 你确认,可选)

publish.py 用 md2wechat 把文章直接推到公众号草稿箱(上传封面+正文图到素材库 → 建草稿)。需 WECHAT_APPID/WECHAT_APP_SECRET + 服务器出口 IP 在白名单 + 认证公众号;没配凭据 / 上传失败 → 自动降级为本地生成样式化 HTML 供手动粘贴。不发布可跳过本步、把产物给用户。

① 查凭据(决定走直推还是本地降级):

export SKILL_DIR=<本 skill 目录>
conda run -n paper2anything --no-capture-output python "${SKILL_DIR}/scripts/publish.py" --check-creds

0 = 有凭据可直推(走 ②);2 = 没配,走 ③ 本地降级。

② 有凭据 → 发布前给用户过目并确认(直推是外发到你的公众号):Read wechat_article.json标题 + 摘要发给用户看、SendUserFilecover.jpg;用 AskUserQuestion 让用户确认上传草稿(草稿非公开,仍需用户去后台群发才公开)。确认后上传:

pdf_path="/path/to/paper.pdf"; export SKILL_DIR=<本 skill 目录>
WORKDIR="$(dirname "$pdf_path")/.paper2anything/wechat/$(basename "${pdf_path%.*}")"
conda run -n paper2anything --no-capture-output python "${SKILL_DIR}/scripts/publish.py" --workdir "$WORKDIR"

成功打印 media_id;提示用户去 mp.weixin.qq.com → 草稿箱 预览 / 群发。(md2wechat 要求至少一张图作封面,确保 cover.jpg 已生成。)

③ 没凭据(或用户不想直推)→ 本地降级

pdf_path="/path/to/paper.pdf"; export SKILL_DIR=<本 skill 目录>
WORKDIR="$(dirname "$pdf_path")/.paper2anything/wechat/$(basename "${pdf_path%.*}")"
conda run -n paper2anything --no-capture-output python "${SKILL_DIR}/scripts/publish.py" --workdir "$WORKDIR" --local-only

产出 wechat_article.html,提示用户打开、全选复制、粘贴到公众号编辑器。


Step 6:把成品归集到 PDF 旁

成品默认埋在 .paper2anything/wechat/<stem>/ 里不好找。长文+配图+封面定稿后(无论是否走 Step 5 排版),把它们复制 一份到与 PDF 同级<stem>_wechat/ 目录(.paper2anything 内副本保留不动),让用户在论文旁直接取用:

pdf_path="/path/to/paper.pdf"
WORKDIR="$(dirname "$pdf_path")/.paper2anything/wechat/$(basename "${pdf_path%.*}")"
DEST="${pdf_path%.*}_wechat"          # 与 PDF 同目录、同名 + _wechat 后缀
i=2; while [ -e "$DEST" ]; do DEST="${pdf_path%.*}_wechat_v$i"; i=$((i+1)); done   # 重名则追加 _v2、_v3
mkdir -p "$DEST"
cp "$WORKDIR/wechat_article.md" "$WORKDIR/wechat_article.json" "$DEST/"
[ -f "$WORKDIR/cover.jpg" ] && cp "$WORKDIR/cover.jpg" "$DEST/"                  # 封面可能 skipped,存在才复制
[ -f "$WORKDIR/wechat_article.html" ] && cp "$WORKDIR/wechat_article.html" "$DEST/"  # 降级时的本地样式化 HTML(直推草稿成功则没有此文件)
cp -r "$WORKDIR/figures" "$DEST/"     # 正文以 figures/<name> 相对引用配图,须一并带上

wechat_article.md![图注](figures/<name>) 相对引用配图,故长文与 figures/ 整组放进 <stem>_wechat/ 子目录、引用不破。


产物位置

中间产物落在论文旁 <pdf目录>/.paper2anything/wechat/<stem>/(同目录多篇论文按 <stem> 分篇、互不覆盖),最终成品另复制到 PDF 同级的 <stem>_wechat/(Step 6):

路径 内容 谁写
.paper2anything/wechat/<stem>/parsed/ MinerU PIR(meta/sections/figures_index/tables_index/references) parse_pdf
.paper2anything/wechat/<stem>/figures/ 论文插图 + 表格图实体 parse_pdf
.paper2anything/wechat/<stem>/understanding/paper_understanding.json 论文理解 + important_figures
.paper2anything/wechat/<stem>/wechat_article.md .json 深度解读长文 + 元数据
.paper2anything/wechat/<stem>/cover.jpg 横版封面 cover
.paper2anything/wechat/<stem>/wechat_article.html 降级时本地生成的样式化 HTML(直推草稿成功则不产此文件) publish
.paper2anything/wechat/<stem>/logs/ 各脚本 *_result.json 脚本
<pdf目录>/<stem>_wechat/ 成品归集wechat_article.md + .json + cover.jpg + figures/,与 PDF 同级 你(Step 6)

重跑覆盖工作区 .paper2anything/wechat/<stem>/(中间产物);归集步骤遇同名 <stem>_wechat/ 会另存为 _v2_v3,不覆盖旧成品。


排错

  • MinerU 解析失败:核对 MINERU_API_TOKEN;PDF ≤200MB / ≤200 页;能访问 mineru.net。重跑 Step 1(覆盖)。
  • 封面没生成(skipped:通常是既没配可用 OPENAI_API_KEY、又没有可复用的论文原图。配上 key 走 AI 生图,或确保 understanding.important_figuressuitable_for_cover:trueimage_path 存在的横版图以供本地合成回退。
  • 发布到草稿箱报错:需 WECHAT_APPID/WECHAT_APP_SECRET(从微信开发者平台 developers.weixin.qq.com 获取;AppSecret 重置后旧的失效)+ 本机出口 IP 加到「API IP白名单」 + 认证公众号(未认证号无 draft/add 权限,报 404)。按 errcode 排查:40164 IP 不在白名单、40001 AppSecret 错、40013 AppID 错、404 未认证。查本机出口 IP:用真凭据打一次 GET https://api.weixin.qq.com/cgi-bin/token40164 的 errmsg 会直接写出微信看到的 IP(只打印 errmsg、勿回显 secret)。md2wechat 还要求至少一张图作封面,确保 cover.jpg 存在。
  • 没凭据 / 不想直推:Step 5 用 --local-only 降级为本地样式化 HTML(wechat_article.html)手动粘贴。
  • 理解/撰文不需要 API key:这两步是你亲自做的,不调用任何 LLM API。
将学术论文PDF转化为小红书帖子。AI主导选题与文案创作,确保内容准确不夸大;调用脚本解析PDF及生成封面。在理解论文和定稿两个关键点暂停确认,实现人机协作的高效内容生产。
论文转小红书 paper2xhs 把这篇论文发小红书 论文转社交媒体 PDF 转小红书帖子
paper2xhs/SKILL.md
npx skills add QuZhan51496/paper2anything --skill paper2xhs -g -y
SKILL.md
Frontmatter
{
    "name": "paper2xhs",
    "description": "把学术论文 PDF 转成小红书帖子(标题 + 正文 + 标签 + 封面)。你主导设计的协调式:机械活(MinerU 解析 PDF、生成封面、半自动发布)交给 scripts\/ 下的小工具,论文理解、选题角度、文案撰写由你亲自完成并在关键点与用户确认。当用户说“论文转小红书”、“paper2xhs”、“把这篇论文发小红书”、“论文转社交媒体”、“PDF 转小红书帖子”时触发。",
    "allowed-tools": "Bash, Read, Write, Glob, Grep, AskUserQuestion, SendUserFile"
}

paper2xhs — 论文转小红书(你主导的协调式)

把一篇论文 PDF 转成小红书帖子。你是主笔:这份文件是配方,不是全自动脚本—— 没有 main.py。机械步骤(解析 / 封面 / 发布)调用 scripts/ 下的小工具;论文理解、 选题角度、文案撰写由你亲自完成(用 Read 看材料、用 Write 落产物),并在关键点用 AskUserQuestion 与用户确认。

PDF
 → 解析            (parse_pdf.py:MinerU → parsed/ + figures/)
 → 你读懂论文       (读 parsed/ + 看 figures/) → understanding/paper_understanding.json   [确认选题角度]
 → 你写小红书文案    (标题/正文/标签/封面文字) → xhs_post.json + xhs_post.md            [确认文案]
 → 封面            (cover.py:默认 API 生图 gpt-image-2;无 key/key 不可用回退本地合成复用原图)
 → 半自动发布       (publish.py,可选)
 → 小红书帖子

运行方式

  1. 一步步来:机械步骤用 Bash 调脚本,创作步骤你自己用 Read / Write 做。不要试图一条命令跑完。
  2. 每个 Bash 块开头就地算 WORKDIR——各 Bash 调用是独立 shell、不共享变量,所以别指望 export 跨步存活:
    WORKDIR="$(dirname "$pdf_path")/.paper2anything/xhs/$(basename "${pdf_path%.*}")"
    
    其中 $pdf_path 是用户给的论文 PDF 路径(每个块都重新设一次)。脚本在 ${SKILL_DIR}/scripts——SKILL_DIR本 skill 的目录(见本 skill 顶部注入的 "Base directory for this skill: …");各 Bash 块独立 shell, 用到它的块开头按需 export SKILL_DIR=<那个目录> 一次(和 WORKDIR 一样每块现设)。
  3. 在两个决策点用 AskUserQuestion 暂停:① 读懂论文后确认“选题角度”;② 文案成稿后确认。用户想改,可直接改产物 JSON/MD 或告诉你改。
  4. 小红书是“准确、不夸大的科普”:忠实反映论文贡献,口语化、有钩子,但绝不编造数据或夸大结论

Step 0:环境与凭据

统一环境:所有 python 命令都在 paper2anything 的统一 conda 环境里(顶层 environment.yml 创建),命令以 conda run -n paper2anything --no-capture-output 为前缀。

凭据集中在 paper2anything 包根的 .env(从 .env.example 复制,已 gitignore)。每个新 shell 先导出一次:

set -a; source <paper2anything 包根>/.env; set +a

本 skill 用到的 key(理解与文案由你亲自做,不调用任何 LLM API):

  • MINERU_API_TOKEN — 解析 PDF(必填)
  • OPENAI_API_KEY(+ OPENAI_BASE_URL) — 封面默认走它生图(gpt-image-2);无 key 或 key 不可用时回退本地合成(复用论文原图)
  • XHS_MCP_BIN — 可选:自定义 xiaohongshu-mcp 二进制位置;不设则发布时 skill 自动按平台下载~/.paper2anything/xhs/。另可选 XHS_MCP_URL(自定义服务地址/端口,默认 http://localhost:18060)。

依赖自检(缺啥按提示装;依赖统一在 environment.yml):

conda run -n paper2anything --no-capture-output python -c "import requests, rich, dotenv" 2>&1

Step 1:解析 PDF(脚本)

pdf_path="/path/to/paper.pdf"          # ← 用户的论文 PDF
WORKDIR="$(dirname "$pdf_path")/.paper2anything/xhs/$(basename "${pdf_path%.*}")"
conda run -n paper2anything --no-capture-output \
  python "${SKILL_DIR}/scripts/parse_pdf.py" "$pdf_path" --workdir "$WORKDIR"

产出($WORKDIR 下):

  • parsed/paper_meta.json(title / authors / abstract)、parsed/sections.json[{title, content}])、parsed/figures_index.json[{figure_id, caption, image_path, page}]image_path 已指向 figures/ 实体)、parsed/references.json
  • figures/* 论文插图实体

解析完,先 Read parsed/sections.jsonparsed/paper_meta.json 通读全文。


Step 2:读懂论文 → 写 understanding(你来做)[确认]

这是创作的地基,你自己做判断,不要交给脚本:

  1. Read parsed/sections.json(全文)+ parsed/paper_meta.jsonRead parsed/figures_index.json 看图注(个别图 caption 可能为空,以实际看图为准),并实际 Read 几张候选图片figures/ 下)判断哪些清晰、适合做封面或配图——图注说“framework”的图在小图里未必好看,只有你的眼睛能判断。
  2. Writeunderstanding/paper_understanding.json,schema:
    {
      "paper_title": "...", "method_name": "方法简称(如 AccKV)",
      "one_sentence_summary": "一句话讲清这篇做了什么",
      "problem": "解决什么问题", "method": "怎么做的",
      "highlights": ["有数据支撑的亮点1", "创新点2", "应用价值3"],
      "experiment_results": ["关键数据1(含数字)", "..."],
      "keywords": ["领域关键词", "..."],
      "cover_palette": {"bg": "#F4F5F7", "accent": "#2E86AB"},
      "important_figures": [
        {"figure_id": "fig_1", "image_path": "<figures_index.json 里的真实路径>",
         "suitable_for_cover": true, "importance_score": 0.9, "description": "图说明"}
      ]
    }
    
    • important_figures 必须含 image_path(取自 parsed/figures_index.json,指向真实存在的图)、suitable_for_coverimportance_score——封面默认走 API 生图(gpt-image-2),仅当 OPENAI_API_KEY 未配/不可用时回退本地合成、靠这几个字段复用原图;漏了则回退时无图 → 封面 skipped
    • cover_palette(可选):本地合成回退路径的配色,按论文领域选 bg(浅色打底) + accent(强调色),标题字色会随底色深浅自动适配。参考浅色调:通用 #F4F5F7+#2E86AB、生物 #EEF6F0+#2D8A5F、物理数学 #F1ECF8+#6A30C2、工程 #FBF0EC+#D85A3C、社科 #F4EEF2+#8A5A78、化学 #EAF4F8+#0E86C0
  3. AskUserQuestion 与用户确认选题角度:这篇论文发小红书主打哪个亮点 / 用什么钩子 / 面向哪类读者。带着确认结果再写文案。

Step 3:写小红书帖子(你来做)[确认]

按小红书风格亲自撰写,用 Writexhs_post.jsonxhs_post.md

小红书文案规则(领域知识):

  • 标题 ≤20 字,吸睛:含核心价值、或数字、或对比、或悬念式提问。
  • 正文 300–600 字,结构:
    1. 开头 1–2 句钩子,抓住注意力
    2. 这篇论文是什么、解决什么问题(2–3 句)
    3. 3–5 个核心亮点,每点用 emoji 开头,简洁有力
    4. 1–3 个关键实验数据,要具体
    5. 对读者有什么用(1–2 句)
    6. 结尾引导互动(如“你觉得这方法能用在哪?”)
  • 风格:口语化、易读、不端学术腔,但忠实准确、不夸大、不编数据
  • 标签 8–12 个,写在正文末尾;hashtags 字段同步放这些标签(发布脚本读 hashtags)。
  • 封面文字 cover_text ≤15 字(封面大字用)。

产物 schema —— xhs_post.json

{"title": "...", "body": "含 emoji/换行,末尾带标签的完整正文",
 "hashtags": ["#标签1", "#标签2"], "cover_text": "≤15字封面词", "paper_title_zh": "论文中文标题"}

xhs_post.md:第一行 # {title},然后正文;可在顶部放 ![封面](cover.png) 占位(封面在 Step 4 生成)。

写完用 AskUserQuestion 给用户看标题 + 正文摘要,确认或按反馈修改(可直接改 JSON/MD)。


Step 4:生成封面(脚本,可选)

封面主/副标题此刻由你现拟(你已读透论文,比从 JSON 里捡更贴切),经 --title(主标题大字)/ --subtitle(副标题小字)传入:

pdf_path="/path/to/paper.pdf"
WORKDIR="$(dirname "$pdf_path")/.paper2anything/xhs/$(basename "${pdf_path%.*}")"
conda run -n paper2anything --no-capture-output \
  python "${SKILL_DIR}/scripts/cover.py" --workdir "$WORKDIR" \
  --title "你拟的封面主标题大字" --subtitle "你拟的副标题小字"

逻辑:默认用 OPENAI_IMAGE_MODEL(默认 gpt-image-2)生成竖版封面,主标题大字用你传入的 --title、副标题小字用 --subtitle(留空才分别回退 xhs_post.cover_text / 论文标题);未配 OPENAI_API_KEY 或 key 不可用时回退本地合成——复用 understanding.important_figuressuitable_for_cover 最高分的论文原图(叠加 --title,配色取 cover_palette);两者都不可用则 skipped(不阻断流程)。产出 cover.png


Step 5:发布到小红书(脚本 + 你协调,可选)

发布走开源的 xiaohongshu-mcp(自带无头 Chromium 的单二进制 + REST API)。登录一次后 cookies 持久、之后免登录。二进制由 ① 自动备好(XHS_MCP_BIN 仅自定义位置时配,见 Step 0)。首次配置/登录的分环境完整步骤见 references/publish-guide.md——先 Read 它。不发布就跳过本步,把产物路径告诉用户手动发。

① 确保 mcp 二进制就位并在固定持久目录运行(二进制不存在会自动下载;cookies 落这里、跨论文复用):

export XHS_MCP_DIR="$HOME/.paper2anything/xhs"; mkdir -p "$XHS_MCP_DIR"
# 解析二进制:优先 .env 的 XHS_MCP_BIN;否则用持久目录里的;都没有就按平台自动下载
if [ -n "$XHS_MCP_BIN" ] && [ -x "$XHS_MCP_BIN" ]; then BIN="$XHS_MCP_BIN"; else
  case "$(uname -s)-$(uname -m)" in
    Linux-x86_64)  ASSET=xiaohongshu-mcp-linux-amd64 ;;
    Darwin-arm64)  ASSET=xiaohongshu-mcp-darwin-arm64 ;;
    Darwin-x86_64) ASSET=xiaohongshu-mcp-darwin-amd64 ;;
    *) ASSET= ; echo "未知平台,请手动下载 xiaohongshu-mcp 并在 .env 设 XHS_MCP_BIN" ;;
  esac
  BIN="$XHS_MCP_DIR/$ASSET"
  if [ -n "$ASSET" ] && [ ! -x "$BIN" ]; then
    echo "未找到 mcp 二进制,自动下载 $ASSET …"
    curl -fL -o "$XHS_MCP_DIR/$ASSET.tar.gz" "https://github.com/xpzouying/xiaohongshu-mcp/releases/latest/download/$ASSET.tar.gz" \
      && tar xzf "$XHS_MCP_DIR/$ASSET.tar.gz" -C "$XHS_MCP_DIR" && chmod +x "$BIN"
  fi
fi
# 起服务(已在跑就跳过;BIN 不可用则报错、不硬起)
if ! curl -sf http://localhost:18060/api/v1/login/status >/dev/null 2>&1; then
  if [ ! -x "$BIN" ]; then
    echo "mcp 二进制不可用($BIN)——下载失败或平台不支持,无法发布;手动下载并设 XHS_MCP_BIN,见 references/publish-guide.md"
  else
    ( cd "$XHS_MCP_DIR" && nohup "$BIN" -port=:18060 > mcp.log 2>&1 & )
    for i in $(seq 1 30); do curl -sf http://localhost:18060/api/v1/login/status >/dev/null 2>&1 && break; sleep 2; done
  fi
fi

(首次会下载 mcp 二进制 + 其 Chromium(约 150MB),可能要等;日志见 $XHS_MCP_DIR/mcp.log。macOS 若被 Gatekeeper 拦:xattr -c "$BIN"。)

② 查登录态

conda run -n paper2anything --no-capture-output python "${SKILL_DIR}/scripts/publish.py" --check-only

已登录 → 跳到 ④。未登录 → 走 ③。

③ 登录(仅首次或会话失效时):登录要换带界面/monitor 的方式起 mcp,先停掉 ① 起的那个(按进程名精确停,别用 pkill -f,会误杀自身):

pkill -x xiaohongshu-mcp; sleep 1

再照 references/publish-guide.md 按环境操作。无头服务器要点:带 -rod "monitor=:9273" 重起 mcp(保持默认无头)→ xhs_login.py 取码 → SendUserFileqr.png 发用户、提醒首次可能要先在 monitor 端口(:9273)的浏览器界面里扫一道「新设备验证」码AskUserQuestion 等用户确认扫完 → 监测 cookies 写出 → 成功后pkill -x xiaohongshu-mcp 停掉、回 ① 重启(去掉 monitor、加载 cookies)。

conda run -n paper2anything --no-capture-output python "${SKILL_DIR}/scripts/xhs_login.py" \
  --out "$XHS_MCP_DIR/qr.png" --cookies "$XHS_MCP_DIR/cookies.json" --wait

④ 发布前给用户过目Read xhs_post.json标题 + 正文发给用户看,SendUserFilecover.png;用 AskUserQuestion 让用户确认发布并选可见性(选项默认「公开可见」,另有「仅自己可见」「仅互关好友可见」)。

⑤ 发布(传入用户选的可见性):

pdf_path="/path/to/paper.pdf"
WORKDIR="$(dirname "$pdf_path")/.paper2anything/xhs/$(basename "${pdf_path%.*}")"
conda run -n paper2anything --no-capture-output \
  python "${SKILL_DIR}/scripts/publish.py" --workdir "$WORKDIR" --visibility "公开可见"

返回「发布成功」即完成。


Step 6:把成品归集到 PDF 旁

成品默认埋在 .paper2anything/xhs/<stem>/ 里不好找。文案+封面定稿后(无论是否走 Step 5 发布),把它们复制一份 到与 PDF 同级<stem>_xhs/ 目录(.paper2anything 内副本保留不动),让用户在论文旁直接取用:

pdf_path="/path/to/paper.pdf"
WORKDIR="$(dirname "$pdf_path")/.paper2anything/xhs/$(basename "${pdf_path%.*}")"
DEST="${pdf_path%.*}_xhs"             # 与 PDF 同目录、同名 + _xhs 后缀
i=2; while [ -e "$DEST" ]; do DEST="${pdf_path%.*}_xhs_v$i"; i=$((i+1)); done   # 重名则追加 _v2、_v3
mkdir -p "$DEST"
cp "$WORKDIR/xhs_post.md" "$WORKDIR/xhs_post.json" "$DEST/"
[ -f "$WORKDIR/cover.png" ] && cp "$WORKDIR/cover.png" "$DEST/"   # 封面可能 skipped,存在才复制

xhs_post.md![封面](cover.png) 相对引用封面,故文案与封面整组放进 <stem>_xhs/ 子目录、引用不破。


产物位置

中间产物落在论文旁 <pdf目录>/.paper2anything/xhs/<stem>/(同目录多篇论文按 <stem> 分篇、互不覆盖),最终成品另复制到 PDF 同级的 <stem>_xhs/(Step 6):

路径 内容 谁写
.paper2anything/xhs/<stem>/parsed/ MinerU PIR(meta/sections/figures_index/references) parse_pdf
.paper2anything/xhs/<stem>/figures/ 论文插图实体 parse_pdf
.paper2anything/xhs/<stem>/understanding/paper_understanding.json 论文理解 + important_figures
.paper2anything/xhs/<stem>/xhs_post.json xhs_post.md 小红书文案
.paper2anything/xhs/<stem>/cover.png 封面 cover
.paper2anything/xhs/<stem>/logs/ 各脚本 *_result.json 脚本
<pdf目录>/<stem>_xhs/ 成品归集xhs_post.md + .json + cover.png,与 PDF 同级 你(Step 6)

重跑覆盖工作区 .paper2anything/xhs/<stem>/(中间产物);归集步骤遇同名 <stem>_xhs/ 会另存为 _v2_v3,不覆盖旧成品。


排错

  • MinerU 解析失败:核对 .envMINERU_API_TOKEN(在 https://mineru.net 申请);PDF 应 ≤200MB / ≤200 页;能访问 mineru.net。重跑 Step 1 即可(覆盖)。
  • 封面没生成(skipped:通常是既没配可用 OPENAI_API_KEY、又没有可复用的论文原图。配上 key 走 AI 生图,或确保 understanding.important_figuressuitable_for_cover:trueimage_path 存在的图以供本地合成回退。
  • 发布步骤报错未登录 → 按 references/publish-guide.md 完成登录(首次注意「新设备验证」);连不上 mcp → 看 ① 是否成功起服务(二进制下载/启动失败查 $XHS_MCP_DIR/mcp.log)。登录成功后须重启 mcp 才会加载 cookies。不发布可跳过 Step 5、手动发产物。
  • 理解/文案不需要 API key:这两步是你亲自做的,不调用任何 LLM API。

trang chủ - Wiki
Copyright © 2011-2026 iteam. Current version is 2.155.2. UTC+08:00, 2026-07-06 03:11
浙ICP备14020137号-1 $bản đồ khách truy cập$