I needed to make a local backup of an old Drupal 7 site that has been upgraded to Drupal 8. The site used search_api_solr with Solr 5 and search_api_attachments with a local tika jar file installed to do the extraction. After going through the history in github to find a suitable docker-compose.solr.yaml for solr 5, and getting that setup, I realized that the attachment extraction wasn't working because I no longer had a local tika install to work with.
My first thought was to use the tika built into Solr, but after a couple of hours of messing with it, I thought it might be easier just to get tika running locally to the web container again. That came with its own new set of challenges. It turns out that getting an old version of Java installed to support Tika was less than easy. Then I remembered the tika server option in the search api attachments settings page.
A quick google search led me to Apache Tika-Docker on github.
I added a new docker-compose.tika-server.yaml
file to my .ddev folder with the following contents:
version: '3.6'
services:
tika: # This is the service name used when running ddev commands accepting the --service flag
container_name: ddev-${DDEV_SITENAME}-tika # This is the name of the container. It is recommended to follow the same name convention used in the main docker-compose.yml file.
image: apache/tika:latest
restart: "no"
ports:
- 9998 # Tika is served from this port inside the container
labels:
# These labels ensure this service is discoverable by ddev
com.ddev.site-name: ${DDEV_SITENAME}
com.ddev.approot: $DDEV_APPROOT
environment:
- VIRTUAL_HOST=$DDEV_HOSTNAME # This defines the host name the service should be accessible from. This will be sitename.ddev.local
- HTTP_EXPOSE=9998 # This defines the port the service should be accessible from at sitename.ddev.local
# This links the tika service to the web service defined in the main docker-compose.yml, allowing applications running in the web service to access the solr service at sitename.ddev.local:9998
web:
links:
- tika:$DDEV_HOSTNAME
A quick ddev restart
rebuilt my containers with a new tiki
container, and all that was left to do, was update the search API Attachments settings.
I started a re-index, and the files started to be extracted. I wish I had thought of that approach before wasting 3 or 4 hours on it. But like every labyrinth like that, I learned a lot about a lot of things during the process.
Add new comment