类库
› fscrawler
dadoonet/fscrawler
FS Crawler是一个用于Elasticsearch的文件系统爬虫,支持索引PDF、Office文档等二进制文件,提供本地和远程文件系统爬取功能。
技术栈
beans java
查看全部依赖 (4)
依赖
fr.pilato.elasticsearch.crawler:fscrawler-framework
fr.pilato.elasticsearch.crawler:fscrawler-settings
fr.pilato.elasticsearch.crawler:fscrawler-test-documents
fr.pilato.elasticsearch.crawler:fscrawler-test-framework
cli java
查看全部依赖 (8)
依赖
com.beust:jcommander
fr.pilato.elasticsearch.crawler:fscrawler-core
fr.pilato.elasticsearch.crawler:fscrawler-fs-http-plugin
fr.pilato.elasticsearch.crawler:fscrawler-fs-local-plugin
fr.pilato.elasticsearch.crawler:fscrawler-fs-s3-plugin
fr.pilato.elasticsearch.crawler:fscrawler-rest
fr.pilato.elasticsearch.crawler:fscrawler-test-framework
org.apache.logging.log4j:log4j-slf4j-impl
contrib java
查看全部依赖 (1)
依赖
org.elasticsearch:rest-api-spec
core java
查看全部依赖 (12)
依赖
fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client
fr.pilato.elasticsearch.crawler:fscrawler-framework
fr.pilato.elasticsearch.crawler:fscrawler-plugin
fr.pilato.elasticsearch.crawler:fscrawler-settings
fr.pilato.elasticsearch.crawler:fscrawler-test-documents
fr.pilato.elasticsearch.crawler:fscrawler-test-framework
fr.pilato.elasticsearch.crawler:fscrawler-tika
io.opentelemetry:opentelemetry-api
org.glassfish.jersey.containers:jersey-container-grizzly2-http
org.glassfish.jersey.inject:jersey-hk2
org.glassfish.jersey.media:jersey-media-json-jackson
org.glassfish.jersey.media:jersey-media-multipart
distribution java
查看全部依赖 (2)
依赖
fr.pilato.elasticsearch.crawler:fscrawler-cli
fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client
elasticsearch-client java
查看全部依赖 (12)
依赖
com.fasterxml.jackson.module:jackson-module-jaxb-annotations
com.jayway.jsonpath:json-path
fr.pilato.elasticsearch.crawler:fscrawler-beans
fr.pilato.elasticsearch.crawler:fscrawler-framework
fr.pilato.elasticsearch.crawler:fscrawler-settings
fr.pilato.elasticsearch.crawler:fscrawler-test-framework
org.awaitility:awaitility
org.glassfish.jersey.core:jersey-client
org.glassfish.jersey.inject:jersey-hk2
org.glassfish.jersey.media:jersey-media-json-jackson
org.testcontainers:testcontainers-nginx
org.wiremock:wiremock-standalone
framework java
查看全部依赖 (16)
依赖
com.fasterxml.jackson.core:jackson-annotations
com.fasterxml.jackson.core:jackson-core
com.fasterxml.jackson.core:jackson-databind
com.fasterxml.jackson.dataformat:jackson-dataformat-xml
com.fasterxml.jackson.dataformat:jackson-dataformat-yaml
com.fasterxml.jackson.datatype:jackson-datatype-jsr310
com.jayway.jsonpath:json-path
commons-io:commons-io
fr.pilato.elasticsearch.crawler:fscrawler-test-framework
io.opentelemetry:opentelemetry-api
org.apache.logging.log4j:log4j-core
org.apache.logging.log4j:log4j-jcl
org.apache.logging.log4j:log4j-jul
org.apache.logging.log4j:log4j-slf4j2-impl
org.awaitility:awaitility
org.pf4j:pf4j
integration-tests java
查看全部依赖 (15)
依赖
fr.pilato.elasticsearch.crawler:fscrawler-core
fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client
fr.pilato.elasticsearch.crawler:fscrawler-fs-ftp-plugin
fr.pilato.elasticsearch.crawler:fscrawler-fs-http-plugin
fr.pilato.elasticsearch.crawler:fscrawler-fs-local-plugin
fr.pilato.elasticsearch.crawler:fscrawler-fs-s3-plugin
fr.pilato.elasticsearch.crawler:fscrawler-fs-ssh-plugin
fr.pilato.elasticsearch.crawler:fscrawler-rest
fr.pilato.elasticsearch.crawler:fscrawler-test-documents
fr.pilato.elasticsearch.crawler:fscrawler-test-framework
org.apache.tika:tika-core
org.mockftpserver:MockFtpServer
org.testcontainers:testcontainers-elasticsearch
org.testcontainers:testcontainers-minio
org.testcontainers:testcontainers-nginx
plugin java
查看全部依赖 (2)
依赖
fr.pilato.elasticsearch.crawler:fscrawler-beans
fr.pilato.elasticsearch.crawler:fscrawler-settings
plugins/fs-ftp-plugin java
查看全部依赖 (3)
依赖
commons-net:commons-net
org.apache.logging.log4j:log4j-iostreams
org.mockftpserver:MockFtpServer
plugins/fs-http-plugin java
查看全部依赖 (1)
依赖
org.testcontainers:testcontainers-nginx
plugins/fs-s3-plugin java
查看全部依赖 (3)
依赖
com.squareup.okhttp3:okhttp-jvm
io.minio:minio
org.testcontainers:testcontainers-minio
plugins/fs-ssh-plugin java
查看全部依赖 (1)
依赖
org.apache.sshd:sshd-sftp
plugins java
查看全部依赖 (3)
依赖
fr.pilato.elasticsearch.crawler:fscrawler-framework
fr.pilato.elasticsearch.crawler:fscrawler-plugin
fr.pilato.elasticsearch.crawler:fscrawler-test-framework
根目录 java
查看全部依赖 (80)
依赖
com.beust:jcommander
com.carrotsearch.randomizedtesting:randomizedtesting-runner
com.fasterxml.jackson:jackson-bom
com.github.gestalt-config:gestalt-core
com.github.gestalt-config:gestalt-json
com.github.gestalt-config:gestalt-yaml
com.google.guava:guava
com.google.protobuf:protobuf-java
com.jayway.jsonpath:json-path
com.squareup.okhttp3:okhttp-jvm
com.squareup.okio:okio-jvm
com.sun.activation:jakarta.activation
commons-io:commons-io
commons-logging:commons-logging
commons-net:commons-net
fr.pilato.elasticsearch.crawler:fscrawler-beans
fr.pilato.elasticsearch.crawler:fscrawler-cli
fr.pilato.elasticsearch.crawler:fscrawler-core
fr.pilato.elasticsearch.crawler:fscrawler-crawler
fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client
fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client-base
fr.pilato.elasticsearch.crawler:fscrawler-elasticsearch-client-v7
fr.pilato.elasticsearch.crawler:fscrawler-framework
fr.pilato.elasticsearch.crawler:fscrawler-fs-ftp-plugin
fr.pilato.elasticsearch.crawler:fscrawler-fs-http-plugin
fr.pilato.elasticsearch.crawler:fscrawler-fs-local-plugin
fr.pilato.elasticsearch.crawler:fscrawler-fs-s3-plugin
fr.pilato.elasticsearch.crawler:fscrawler-fs-ssh-plugin
fr.pilato.elasticsearch.crawler:fscrawler-it-common
fr.pilato.elasticsearch.crawler:fscrawler-plugin
fr.pilato.elasticsearch.crawler:fscrawler-plugins
fr.pilato.elasticsearch.crawler:fscrawler-rest
fr.pilato.elasticsearch.crawler:fscrawler-settings
fr.pilato.elasticsearch.crawler:fscrawler-test-documents
fr.pilato.elasticsearch.crawler:fscrawler-test-framework
fr.pilato.elasticsearch.crawler:fscrawler-tika
fr.pilato.elasticsearch.crawler:fscrawler-welcome-plugin
io.minio:minio
io.opentelemetry:opentelemetry-bom
jakarta.activation:jakarta.activation-api
jakarta.annotation:jakarta.annotation-api
jakarta.xml.bind:jakarta.xml.bind-api
javax.xml.bind:jaxb-api
joda-time:joda-time
junit:junit
net.minidev:json-smart
org.apache.commons:commons-lang3
org.apache.logging.log4j:log4j-1.2-api
org.apache.logging.log4j:log4j-api
org.apache.logging.log4j:log4j-core
org.apache.logging.log4j:log4j-iostreams
org.apache.logging.log4j:log4j-jcl
org.apache.logging.log4j:log4j-jul
org.apache.logging.log4j:log4j-slf4j2-impl
org.apache.sshd:sshd-sftp
org.apache.tika:tika-core
org.apache.tika:tika-langdetect-optimaize
org.apache.tika:tika-parent
org.apache.tika:tika-parser-scientific-module
org.apache.tika:tika-parser-sqlite3-module
org.apache.tika:tika-parsers-standard-package
org.assertj:assertj-core
org.awaitility:awaitility
org.bouncycastle:bcprov-jdk18on
org.glassfish.jersey.containers:jersey-container-grizzly2-http
org.glassfish.jersey.core:jersey-client
org.glassfish.jersey.inject:jersey-hk2
org.glassfish.jersey.media:jersey-media-json-jackson
org.glassfish.jersey.media:jersey-media-multipart
org.jetbrains.kotlin:kotlin-stdlib
org.jetbrains:annotations
org.mockftpserver:MockFtpServer
org.ow2.asm:asm
org.pf4j:pf4j
org.testcontainers:testcontainers
org.testcontainers:testcontainers-elasticsearch
org.testcontainers:testcontainers-minio
org.testcontainers:testcontainers-nginx
org.wiremock:wiremock-standalone
org.yaml:snakeyaml
rest java
查看全部依赖 (10)
依赖
com.fasterxml.jackson.module:jackson-module-jaxb-annotations
fr.pilato.elasticsearch.crawler:fscrawler-core
fr.pilato.elasticsearch.crawler:fscrawler-plugin
fr.pilato.elasticsearch.crawler:fscrawler-test-documents
fr.pilato.elasticsearch.crawler:fscrawler-test-framework
javax.xml.bind:jaxb-api
org.glassfish.jersey.containers:jersey-container-grizzly2-http
org.glassfish.jersey.inject:jersey-hk2
org.glassfish.jersey.media:jersey-media-json-jackson
org.glassfish.jersey.media:jersey-media-multipart
settings java
查看全部依赖 (7)
依赖
com.github.gestalt-config:gestalt-core
com.github.gestalt-config:gestalt-json
com.github.gestalt-config:gestalt-yaml
fr.pilato.elasticsearch.crawler:fscrawler-framework
fr.pilato.elasticsearch.crawler:fscrawler-test-framework
jakarta.annotation:jakarta.annotation-api
org.yaml:snakeyaml
test-framework java
查看全部依赖 (8)
依赖
com.carrotsearch.randomizedtesting:randomizedtesting-runner
commons-io:commons-io
org.apache.logging.log4j:log4j-core
org.apache.logging.log4j:log4j-jcl
org.apache.logging.log4j:log4j-jul
org.assertj:assertj-core
org.awaitility:awaitility
org.testcontainers:testcontainers-elasticsearch
tika java
查看全部依赖 (11)
依赖
fr.pilato.elasticsearch.crawler:fscrawler-beans
fr.pilato.elasticsearch.crawler:fscrawler-framework
fr.pilato.elasticsearch.crawler:fscrawler-settings
fr.pilato.elasticsearch.crawler:fscrawler-test-documents
fr.pilato.elasticsearch.crawler:fscrawler-test-framework
io.opentelemetry:opentelemetry-api
org.apache.tika:tika-core
org.apache.tika:tika-langdetect-optimaize
org.apache.tika:tika-parser-scientific-module
org.apache.tika:tika-parser-sqlite3-module
org.apache.tika:tika-parsers-standard-package