类库 › fscrawler
dadoonet

dadoonet/fscrawler

FS Crawler是一个用于Elasticsearch的文件系统爬虫,支持索引PDF、Office文档等二进制文件,提供本地和远程文件系统爬取功能。

1,405 304 1,405 140
在 GitHub 上查看

技术栈

beans java

查看全部依赖 (4)

依赖

fr.pilato.elasticsearch.crawler:fscrawler-framework fr.pilato.elasticsearch.crawler:fscrawler-settings fr.pilato.elasticsearch.crawler:fscrawler-test-documents fr.pilato.elasticsearch.crawler:fscrawler-test-framework

cli java

查看全部依赖 (8)

依赖

com.beust:jcommander fr.pilato.elasticsearch.crawler:fscrawler-core fr.pilato.elasticsearch.crawler:fscrawler-fs-http-plugin fr.pilato.elasticsearch.crawler:fscrawler-fs-local-plugin fr.pilato.elasticsearch.crawler:fscrawler-fs-s3-plugin fr.pilato.elasticsearch.crawler:fscrawler-rest fr.pilato.elasticsearch.crawler:fscrawler-test-framework org.apache.logging.log4j:log4j-slf4j-impl

contrib java

查看全部依赖 (1)

依赖

org.elasticsearch:rest-api-spec

core java

查看全部依赖 (12)

依赖

fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client fr.pilato.elasticsearch.crawler:fscrawler-framework fr.pilato.elasticsearch.crawler:fscrawler-plugin fr.pilato.elasticsearch.crawler:fscrawler-settings fr.pilato.elasticsearch.crawler:fscrawler-test-documents fr.pilato.elasticsearch.crawler:fscrawler-test-framework fr.pilato.elasticsearch.crawler:fscrawler-tika io.opentelemetry:opentelemetry-api org.glassfish.jersey.containers:jersey-container-grizzly2-http org.glassfish.jersey.inject:jersey-hk2 org.glassfish.jersey.media:jersey-media-json-jackson org.glassfish.jersey.media:jersey-media-multipart

distribution java

查看全部依赖 (2)

依赖

fr.pilato.elasticsearch.crawler:fscrawler-cli fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client

elasticsearch-client java

查看全部依赖 (12)

依赖

com.fasterxml.jackson.module:jackson-module-jaxb-annotations com.jayway.jsonpath:json-path fr.pilato.elasticsearch.crawler:fscrawler-beans fr.pilato.elasticsearch.crawler:fscrawler-framework fr.pilato.elasticsearch.crawler:fscrawler-settings fr.pilato.elasticsearch.crawler:fscrawler-test-framework org.awaitility:awaitility org.glassfish.jersey.core:jersey-client org.glassfish.jersey.inject:jersey-hk2 org.glassfish.jersey.media:jersey-media-json-jackson org.testcontainers:testcontainers-nginx org.wiremock:wiremock-standalone

framework java

查看全部依赖 (16)

依赖

com.fasterxml.jackson.core:jackson-annotations com.fasterxml.jackson.core:jackson-core com.fasterxml.jackson.core:jackson-databind com.fasterxml.jackson.dataformat:jackson-dataformat-xml com.fasterxml.jackson.dataformat:jackson-dataformat-yaml com.fasterxml.jackson.datatype:jackson-datatype-jsr310 com.jayway.jsonpath:json-path commons-io:commons-io fr.pilato.elasticsearch.crawler:fscrawler-test-framework io.opentelemetry:opentelemetry-api org.apache.logging.log4j:log4j-core org.apache.logging.log4j:log4j-jcl org.apache.logging.log4j:log4j-jul org.apache.logging.log4j:log4j-slf4j2-impl org.awaitility:awaitility org.pf4j:pf4j

integration-tests java

查看全部依赖 (15)

依赖

fr.pilato.elasticsearch.crawler:fscrawler-core fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client fr.pilato.elasticsearch.crawler:fscrawler-fs-ftp-plugin fr.pilato.elasticsearch.crawler:fscrawler-fs-http-plugin fr.pilato.elasticsearch.crawler:fscrawler-fs-local-plugin fr.pilato.elasticsearch.crawler:fscrawler-fs-s3-plugin fr.pilato.elasticsearch.crawler:fscrawler-fs-ssh-plugin fr.pilato.elasticsearch.crawler:fscrawler-rest fr.pilato.elasticsearch.crawler:fscrawler-test-documents fr.pilato.elasticsearch.crawler:fscrawler-test-framework org.apache.tika:tika-core org.mockftpserver:MockFtpServer org.testcontainers:testcontainers-elasticsearch org.testcontainers:testcontainers-minio org.testcontainers:testcontainers-nginx

plugin java

查看全部依赖 (2)

依赖

fr.pilato.elasticsearch.crawler:fscrawler-beans fr.pilato.elasticsearch.crawler:fscrawler-settings

plugins/fs-ftp-plugin java

查看全部依赖 (3)

依赖

commons-net:commons-net org.apache.logging.log4j:log4j-iostreams org.mockftpserver:MockFtpServer

plugins/fs-http-plugin java

查看全部依赖 (1)

依赖

org.testcontainers:testcontainers-nginx

plugins/fs-s3-plugin java

查看全部依赖 (3)

依赖

com.squareup.okhttp3:okhttp-jvm io.minio:minio org.testcontainers:testcontainers-minio

plugins/fs-ssh-plugin java

查看全部依赖 (1)

依赖

org.apache.sshd:sshd-sftp

plugins java

查看全部依赖 (3)

依赖

fr.pilato.elasticsearch.crawler:fscrawler-framework fr.pilato.elasticsearch.crawler:fscrawler-plugin fr.pilato.elasticsearch.crawler:fscrawler-test-framework

根目录 java

查看全部依赖 (80)

依赖

com.beust:jcommander com.carrotsearch.randomizedtesting:randomizedtesting-runner com.fasterxml.jackson:jackson-bom com.github.gestalt-config:gestalt-core com.github.gestalt-config:gestalt-json com.github.gestalt-config:gestalt-yaml com.google.guava:guava com.google.protobuf:protobuf-java com.jayway.jsonpath:json-path com.squareup.okhttp3:okhttp-jvm com.squareup.okio:okio-jvm com.sun.activation:jakarta.activation commons-io:commons-io commons-logging:commons-logging commons-net:commons-net fr.pilato.elasticsearch.crawler:fscrawler-beans fr.pilato.elasticsearch.crawler:fscrawler-cli fr.pilato.elasticsearch.crawler:fscrawler-core fr.pilato.elasticsearch.crawler:fscrawler-crawler fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client-base fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client-v7 fr.pilato.elasticsearch.crawler:fscrawler-framework fr.pilato.elasticsearch.crawler:fscrawler-fs-ftp-plugin fr.pilato.elasticsearch.crawler:fscrawler-fs-http-plugin fr.pilato.elasticsearch.crawler:fscrawler-fs-local-plugin fr.pilato.elasticsearch.crawler:fscrawler-fs-s3-plugin fr.pilato.elasticsearch.crawler:fscrawler-fs-ssh-plugin fr.pilato.elasticsearch.crawler:fscrawler-it-common fr.pilato.elasticsearch.crawler:fscrawler-plugin fr.pilato.elasticsearch.crawler:fscrawler-plugins fr.pilato.elasticsearch.crawler:fscrawler-rest fr.pilato.elasticsearch.crawler:fscrawler-settings fr.pilato.elasticsearch.crawler:fscrawler-test-documents fr.pilato.elasticsearch.crawler:fscrawler-test-framework fr.pilato.elasticsearch.crawler:fscrawler-tika fr.pilato.elasticsearch.crawler:fscrawler-welcome-plugin io.minio:minio io.opentelemetry:opentelemetry-bom jakarta.activation:jakarta.activation-api jakarta.annotation:jakarta.annotation-api jakarta.xml.bind:jakarta.xml.bind-api javax.xml.bind:jaxb-api joda-time:joda-time junit:junit net.minidev:json-smart org.apache.commons:commons-lang3 org.apache.logging.log4j:log4j-1.2-api org.apache.logging.log4j:log4j-api org.apache.logging.log4j:log4j-core org.apache.logging.log4j:log4j-iostreams org.apache.logging.log4j:log4j-jcl org.apache.logging.log4j:log4j-jul org.apache.logging.log4j:log4j-slf4j2-impl org.apache.sshd:sshd-sftp org.apache.tika:tika-core org.apache.tika:tika-langdetect-optimaize org.apache.tika:tika-parent org.apache.tika:tika-parser-scientific-module org.apache.tika:tika-parser-sqlite3-module org.apache.tika:tika-parsers-standard-package org.assertj:assertj-core org.awaitility:awaitility org.bouncycastle:bcprov-jdk18on org.glassfish.jersey.containers:jersey-container-grizzly2-http org.glassfish.jersey.core:jersey-client org.glassfish.jersey.inject:jersey-hk2 org.glassfish.jersey.media:jersey-media-json-jackson org.glassfish.jersey.media:jersey-media-multipart org.jetbrains.kotlin:kotlin-stdlib org.jetbrains:annotations org.mockftpserver:MockFtpServer org.ow2.asm:asm org.pf4j:pf4j org.testcontainers:testcontainers org.testcontainers:testcontainers-elasticsearch org.testcontainers:testcontainers-minio org.testcontainers:testcontainers-nginx org.wiremock:wiremock-standalone org.yaml:snakeyaml

rest java

查看全部依赖 (10)

依赖

com.fasterxml.jackson.module:jackson-module-jaxb-annotations fr.pilato.elasticsearch.crawler:fscrawler-core fr.pilato.elasticsearch.crawler:fscrawler-plugin fr.pilato.elasticsearch.crawler:fscrawler-test-documents fr.pilato.elasticsearch.crawler:fscrawler-test-framework javax.xml.bind:jaxb-api org.glassfish.jersey.containers:jersey-container-grizzly2-http org.glassfish.jersey.inject:jersey-hk2 org.glassfish.jersey.media:jersey-media-json-jackson org.glassfish.jersey.media:jersey-media-multipart

settings java

查看全部依赖 (7)

依赖

com.github.gestalt-config:gestalt-core com.github.gestalt-config:gestalt-json com.github.gestalt-config:gestalt-yaml fr.pilato.elasticsearch.crawler:fscrawler-framework fr.pilato.elasticsearch.crawler:fscrawler-test-framework jakarta.annotation:jakarta.annotation-api org.yaml:snakeyaml

test-framework java

查看全部依赖 (8)

依赖

com.carrotsearch.randomizedtesting:randomizedtesting-runner commons-io:commons-io org.apache.logging.log4j:log4j-core org.apache.logging.log4j:log4j-jcl org.apache.logging.log4j:log4j-jul org.assertj:assertj-core org.awaitility:awaitility org.testcontainers:testcontainers-elasticsearch

tika java

查看全部依赖 (11)

依赖

fr.pilato.elasticsearch.crawler:fscrawler-beans fr.pilato.elasticsearch.crawler:fscrawler-framework fr.pilato.elasticsearch.crawler:fscrawler-settings fr.pilato.elasticsearch.crawler:fscrawler-test-documents fr.pilato.elasticsearch.crawler:fscrawler-test-framework io.opentelemetry:opentelemetry-api org.apache.tika:tika-core org.apache.tika:tika-langdetect-optimaize org.apache.tika:tika-parser-scientific-module org.apache.tika:tika-parser-sqlite3-module org.apache.tika:tika-parsers-standard-package

评论

Home - Wiki
Copyright © 2011-2026 iteam. Current version is 2.155.2. UTC+08:00, 2026-05-02 17:21
浙ICP备14020137号-1 $Map of visitor$